I have recently been attempting to parse a bit of data from this url for a research project I am doing for school: http://data.rbge.org.uk/service/factsheets/Edinburgh_Rhododendron_Monographs.xhtml
I was given permission to use this data by the creator of the dataset who gave me an .xml file of the content of the website. I have been attempting to parse this data in the program R since that is where I have and will be doing my analysis of the data. The major problem I have is that the .xml file is formatted with one tag around the entirety of the description of each species, meaning I cannot simply extract the data within each tag. However, the data is set up in a uniform way with a word or phrase preceding the data needed (i.e. 'Altitude' is found before the actual data for altitude at which each species grows). Is there a way to search the .xml file for these words and extract the data which immediately follows it? I would prefer to use R but I am also open to using another language if that is the preferable way to do this, I am a bit inexperienced and thus am seeking help here. Thank you.
No comments:
Post a Comment