XPath within OOXML



Here's an XPath / OOXML question for you gurus.


I have a MSOffice docx with highlighted text of different colours that I have to convert to XML, and then identify using XPath where all those highlights are, regardless of the colour, so that they can be filtered in an XML rule.


Here's how an example looks in MSWord


enter image description here


I understand how I can select the child node if it exists using //w:p/w:r/w:rPr/w:highlight[@w:val='yellow']" but not how I can select the actual W:t node containing the text if a highlight node exists in the same <w:r> block.


Example: I need to select the text within <W:t> if a <w:highlight> exists within the parent node and do this for all cases within the document.


So in this example I need to select the text "This one goes because it is highlighted yellow" because it has a w:highlight node with a w:val of yellow related to it.



<w:r w:rsidRPr="003815B4">
<w:rPr>
<w:highlight w:val="yellow"/>
</w:rPr>
<w:t>This one goes because it is highlighted yellow</w:t>
</w:r>


Any help or pointers would be greatly appreciated :-)


Here's an xml example of the docx ( with the OOXML headers removed for readability)



<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<doc>
<w:body>
<w:p w:rsidR="00B93038" w:rsidRDefault="003815B4">
<w:r>
<w:t>This line stays because it is not highlighted in any colour</w:t>
</w:r>
</w:p>
<w:p w:rsidR="003815B4" w:rsidRDefault="003815B4">
<w:r w:rsidRPr="003815B4">
<w:rPr>
<w:highlight w:val="yellow"/>
</w:rPr>
<w:t>This one goes because it is highlighted yellow</w:t>
</w:r>
</w:p>
<w:tbl>
<w:tblPr>
<w:tblStyle w:val="TableGrid"/>
<w:tblW w:w="0" w:type="auto"/>
<w:tblLook w:val="04A0" w:firstRow="1" w:lastRow="0" w:firstColumn="1" w:lastColumn="0" w:noHBand="0" w:noVBand="1"/>
</w:tblPr>
<w:tblGrid>
<w:gridCol w:w="4621"/>
<w:gridCol w:w="4621"/>
</w:tblGrid>
<w:tr w:rsidR="003815B4" w:rsidTr="003815B4">
<w:tc>
<w:tcPr>
<w:tcW w:w="4621" w:type="dxa"/>
</w:tcPr>
<w:p w:rsidR="003815B4" w:rsidRDefault="003815B4">
<w:r>
<w:t>And so on</w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:tcPr>
<w:tcW w:w="4621" w:type="dxa"/>
</w:tcPr>
<w:p w:rsidR="003815B4" w:rsidRDefault="003815B4">
<w:r w:rsidRPr="003815B4">
<w:rPr>
<w:highlight w:val="cyan"/>
</w:rPr>
<w:t>Blue highlight</w:t>
</w:r>
</w:p>
</w:tc>
</w:tr>
<w:tr w:rsidR="003815B4" w:rsidTr="003815B4">
<w:tc>
<w:tcPr>
<w:tcW w:w="4621" w:type="dxa"/>
</w:tcPr>
<w:p w:rsidR="003815B4" w:rsidRDefault="003815B4">
<w:r>
<w:t>Red</w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:tcPr>
<w:tcW w:w="4621" w:type="dxa"/>
</w:tcPr>
<w:p w:rsidR="003815B4" w:rsidRDefault="003815B4">
<w:r>
<w:t xml:space="preserve">Mixed </w:t>
</w:r>
<w:r w:rsidRPr="003815B4">
<w:rPr>
<w:highlight w:val="red"/>
</w:rPr>
<w:t>text</w:t>
</w:r>
<w:r>
<w:t xml:space="preserve"> with </w:t>
</w:r>
<w:r w:rsidRPr="003815B4">
<w:rPr>
<w:highlight w:val="green"/>
</w:rPr>
<w:t>some highlighted</w:t>
</w:r>
<w:r>
<w:t xml:space="preserve"> and some not</w:t>
</w:r>
</w:p>
</w:tc>
</w:tr>
<w:tr w:rsidR="003815B4" w:rsidTr="003815B4">
<w:tc>
<w:tcPr>
<w:tcW w:w="4621" w:type="dxa"/>
</w:tcPr>
<w:p w:rsidR="003815B4" w:rsidRDefault="003815B4"/>
</w:tc>
<w:tc>
<w:tcPr>
<w:tcW w:w="4621" w:type="dxa"/>
</w:tcPr>
<w:p w:rsidR="003815B4" w:rsidRDefault="003815B4"/>
</w:tc>
</w:tr>
</w:tbl>
<w:p w:rsidR="003815B4" w:rsidRDefault="003815B4"/>
<w:p w:rsidR="003815B4" w:rsidRDefault="003815B4">
<w:r>
<w:t>Another highlight</w:t>
</w:r>
</w:p>
<w:p w:rsidR="003815B4" w:rsidRDefault="003815B4">
<w:r>
<w:t>Some text</w:t>
</w:r>
</w:p>
<w:p w:rsidR="003815B4" w:rsidRDefault="003815B4"/>
<w:p w:rsidR="003815B4" w:rsidRDefault="003815B4">
<w:r>
<w:t>End</w:t>
</w:r>
<w:bookmarkStart w:id="0" w:name="_GoBack"/>
<w:bookmarkEnd w:id="0"/>
</w:p>
<w:sectPr w:rsidR="003815B4">
<w:pgSz w:w="11906" w:h="16838"/>
<w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="708" w:footer="708" w:gutter="0"/>
<w:cols w:space="708"/>
<w:docGrid w:linePitch="360"/>
</w:sectPr>
</w:body>
</doc>

No comments:

Post a Comment