Getting image from word document using SAX parser



I need to get the rId for the first image that appears in a word document.Iam dealing with documents that need not be entirely parsed. So i decided to use SAX parser.Previously i was using DOM parser and iam new to SAX. Can someone tell me how to get the rId for the image using SAX ? This is a sample xml file



<pic:pic xmlns:pic="http://ift.tt/1m23oHB">
<pic:nvPicPr>
<pic:cNvPr id="1" name="Harry-Potter-and-the-Prisoner-of-Azkaban-movie-poster.jpg"/>
<pic:cNvPicPr/>
</pic:nvPicPr>
<pic:blipFill>
<a:blip r:embed="rId5">
<a:extLst>
<a:ext uri="{28A0092B-C50C-407E-A947-70E740481C1C}">
<a14:useLocalDpi xmlns:a14="http://ift.tt/1l4Hf9L" val="0"/>
</a:ext>
</a:extLst>
</a:blip>
<a:stretch>
<a:fillRect/>
</a:stretch>
</pic:blipFill>
<pic:spPr>


I have decided to try this: I can have boolean variables namely isPic,isblipFill. When i encounter a <pic:pic> element i can set isPic = true and as i read its child one by one, when i encounter <pic:blipFill> i can set isblipFill = true and after that when i read <a:blip> i can read its attribute r:embed(which has the rId for image). Element will be checked if it is blipFill only if isPic=true and element will be checked if it is blip only if blipFill=true


This is how i check the hierarchy for xmlParsing. I know that this task is much easy using xPath but for the sake of performance iam trying to use SAX Is it the right way to do ?


No comments:

Post a Comment