Hi I'm seriously stuck when trying to filter out my xml document. Here is some example of the contents:
<sentence id="1" document_id="Perseus:text:1999.02.0029" > <primary>millermo</primary> <word id="1" /> <word id="2" /> <word id="3" /> <word id="4" /> </sentence> <sentence id="2" document_id="Perseus:text:1999.02.0029" > <primary>millermo</primary> <word id="1" /> <word id="2" /> <word id="3" /> <word id="4" /> <word id="5" /> <word id="6" /> <word id="7" /> <word id="8" /> </sentence>
There are many sentences (Over 3000) but all I want to do is write some code (preferably in java or python) that will go through my xml file and remove all the sentences which have more than 5 word ids, so in other words I will be left with just sentences tags with 5 or less word ids. Thanks. (Just to note my xml isnt great, I get mixed up with nodes/tags/element/ids.
No comments:
Post a Comment