Text mining and XML : How to incorporate document structure?



To perform text mining on XML documents, I want to use RapidMiner, however I'm wondering how I can incorporate XML tags and attributes in the process.


In fact, when converting XML to, for example, CSV, there is no difference between attributes, tags or textual content of documents. I want to mine the data regarding the structure of the document not as bunch of text. Any suggestion for that?


No comments:

Post a Comment