Returning all values from xml in java



So currently I'm developing a program that converts a very large XML file into a DOM so that I can then traverse through the file and return all the information in a excel like format. The XML file I'm currently using is quite big and messy. Here's a little snippet of it:-



<metabolite>
<version>3.5</version>
<creation_date>2005-11-16 08:48:42 -0700</creation_date>
<update_date>2013-07-24 11:48:59 -0600</update_date>
<accession>HMDB00001</accession>
<secondary_accessions>
<accession>HMDB04935</accession>
<accession>HMDB06703</accession>
<accession>HMDB06704</accession>
</secondary_accessions>
<name>1-Methylhistidine</name>
<description>One-methylhistidine (1-MHis) is derived mainly from the anserine of dietary flesh sources, especially poultry. The enzyme, carnosinase, splits anserine into b-alanine and 1-MHis. High levels of 1-MHis tend to inhibit the enzyme carnosinase and increase anserine levels. Conversely, genetic variants with deficient carnosinase activity in plasma show increased 1-MHis excretions when they consume a high meat diet. Reduced serum carnosinase activity is also found in patients with Parkinson's disease and multiple sclerosis and patients following a cerebrovascular accident. Vitamin E deficiency can lead to 1-methylhistidinuria from increased oxidative effects in skeletal muscle.</description>
<synonyms>
<synonym>1 Methylhistidine</synonym>
<synonym>1-Methyl histidine</synonym>
<synonym>1-Methyl-Histidine</synonym>
<synonym>1-Methyl-L-histidine</synonym>
<synonym>1-MHis</synonym>
<synonym>1-N-Methyl-L-histidine</synonym>
<synonym>L-1-Methylhistidine</synonym>
<synonym>N1-Methyl-L-histidine</synonym>
<synonym>Pi-methylhistidine</synonym>
</synonyms>


Basically everything is contained within the tag , and there are 30ish nodes that stems from it, some of which have further nodes and multiple values. At the moment my goal is to try and retrieve all the information first and print it all out, so that later I can then go about formatting it in a way that makes it suitable for placing it in excel. I'm parsing the document through fine. The main problem I have is returning all the values of the subnodes. This is my code so far:-



private void getValues(Node child) {

if(!child.hasChildNodes()){

System.out.println(child.getNodeValue());
return;

}

else {

NodeList list = child.getChildNodes();

System.out.println(list.getLength());

for (int i=0; i < list.getLength(); i++) {

Node subnode = list.item(i);

if (subnode.getNodeType() == Node.TEXT_NODE) {

System.out.println(subnode.getNodeValue());
return;

}

else if (subnode.getNodeType() == Node.CDATA_SECTION_NODE) {

System.out.println(subnode.getNodeValue());
return;
}

else if (subnode.getNodeType() == Node.ENTITY_REFERENCE_NODE) {

Node recursive = subnode.getNextSibling();
getValues(recursive);
}

}

}

}


So what I do after parsing the XML into java is make a NodeList containing all the children nodes that stems from the node (i.e. the root node) and then I'm trying to traverse through the tree so that information is returned from subnodes etc all in order. However, I'm finding that my program thinks some nodes have children nodes when it doesnt. For example, doesn't have any subnodes and contains just the value 3.5, but for some reason when I do .hasChildNodes() it returns as true. And I get the value as well as #text. Am I doing something wrong with my method of traversing through the tree, or is it a parsing problem?


Sorry for the long post!


No comments:

Post a Comment