Wednesday, 25 February 2015

Split a XMLnode at specifix html tags in r



I have scraped some data from the web using R and the XML package. I already extracted the div container that I want to use, but now I want to split the resulting object into specific parts whenever the <hr> tag occurs.


The data looks like this



...
Something first
<hr color="#E8E8E8"/>
</div>
<div>
Something something
<hr color="#E8E8E8"/>
New something


And a call to class returns the following classnames: "XMLNode" "RXMLAbstractNode" "XMLAbstractNode" "oldClass"


How I can easily split a similar object as the one above everytime a <hr> is mentioned. In the above extract I thus want three separate objects.


Happy for any suggestions.


No comments:

Post a Comment