Nokogiri search for word content controls



I want to find all Microsoft Word content control nodes in a document.xml. Because I am going to preprocess these controls, I need to find these guys recursively from a given xml node, BUT I want to stop at the first occurence.


Sample data : (The content control node is the <w:sdt> one)


Assume these are the nodes I can see from a node X :



<w:tc>
<w:tcPr>
...
</w:tcPr>
<!-- A content control node I want to get -->
<w:sdt>
<w:sdtPr>
...
</w:sdtPr>
<w:sdtEndPr/>
<w:sdtContent>
...
<!--- Embedded in a content control. I don't want it now --->
<w:sdt>
...
</w:sdt>
</w:sdtContent>
</w:sdt>
...
<!-- Another content control ! It isn't embedded in a <w:sdt>, so I want this one -->
<w:sdt>
</w:sdt>
</w:tc>
...
<!--- Another one ! Not at the same depth, but not embedded so I want it ! -->
<w:sdt>
...
</w:sdt>


I want 3 out of the 4 nodes given above. If I just do a nodeX.search('.//sdt'), I will grab all 4 right ? How can I exclude the embedded one ?


No comments:

Post a Comment