I'm trying to iterate through an XML file that has weird formatting (I used pdftohtml to make the xml file and the output I get is weird but it's more usable than outputting to HTML)
Here's an example:
<text height="11" font="3">Lastname1, Firstname1</text> <text height="11" font="3">111111-1</text> <text height="6" font="2">random text</text> <text height="6" font="2">random text</text> <text height="11" font="3">Lastname2, Firstname2</text> <text height="11" font="3">222222-2</text> <text height="6" font="2">random text</text> <text height="6" font="2">random text</text> <text height="11" font="3">Lastname3, Firstname3</text> <text height="11" font="3">name3long</text> <text height="11" font="3">333333-3</text> <text height="6" font="2">random text</text> <text height="6" font="2">random text</text> <text height="11" font="3">Lastname4, Firstname4</text> <text height="11" font="3">444444-4</text> <text height="11" font="3">Lastname5, Firstname5</text> <text height="11" font="3">555555-5</text> <text height="11" font="3">Lastname6, Firstname6</text> <text height="11" font="3">name6long</text> <text height="11" font="3">666666-3</text> To break it down. The Name block starts with the name with attributes of height: 11, font: 3 and ends with the ID that has the same attributes but it is always length: 8.
I thought recursion would solve my problem but it doesn't give me the output I want as I'm trying to get the line numbers of where the name block starts and where it ends.
Here's a sample of the code I'm using
var txt = xml.getElementsByTagName('text'); function block(b){ var line = txt[b]; if(line.innerHTML.length == 8){ return b; } else{ block(b+1); } } function getNameBlock(){ // Notes: Name and Employee ID has attributes of height: 11, left: 62, and font: 3 // Employee ID has always length: 8; // // Start value should be assigned when we hit the attributes of height: 11, left: 62, font: 3 // End value should be assigned when we hit the attributes above as well as length: 8 // Console output will be start and end values for(var i=0;i<txt.length;i++){ var line = txt[i]; var start; var end; if(line.getAttribute('height') == '11' && line.getAttribute('left') == '62' && line.getAttribute('font') == 3){ start = i; end = block(start) console.log("Start: "+start+" End: "+end); } } } My output isn't working the way I want it to because it gives me:
Start: 0 End: undefined Start: 1 End: 1 Start: 4 End: undefined Start: 5 End: 5 etc.... Am I just trying to complicate things with recursion?
No comments:
Post a Comment