Getting image from word document using SAX parser

I need to get the rId for the first image that appears in a word document.Iam dealing with documents that need not be entirely parsed. So i decided to use SAX parser.Previously i was using DOM parser and iam new to SAX. Can someone tell me how to get the rId for the image using SAX ? This is a sample xml file


<pic:pic xmlns:pic="http://ift.tt/1m23oHB">
                                <pic:nvPicPr>
                                    <pic:cNvPr id="1" name="Harry-Potter-and-the-Prisoner-of-Azkaban-movie-poster.jpg"/>
                                    <pic:cNvPicPr/>
                                </pic:nvPicPr>
                                <pic:blipFill>
                                    <a:blip r:embed="rId5">
                                        <a:extLst>
                                            <a:ext uri="{28A0092B-C50C-407E-A947-70E740481C1C}">
                                                <a14:useLocalDpi xmlns:a14="http://ift.tt/1l4Hf9L" val="0"/>
                                            </a:ext>
                                        </a:extLst>
                                    </a:blip>
                                    <a:stretch>
                                        <a:fillRect/>
                                    </a:stretch>
                                </pic:blipFill>
                                <pic:spPr>

I have decided to try this: I can have boolean variables namely isPic,isblipFill. When i encounter a <pic:pic> element i can set isPic = true and as i read its child one by one, when i encounter <pic:blipFill> i can set isblipFill = true and after that when i read <a:blip> i can read its attribute r:embed(which has the rId for image). Element will be checked if it is blipFill only if isPic=true and element will be checked if it is blip only if blipFill=true

This is how i check the hierarchy for xmlParsing. I know that this task is much easy using xPath but for the sake of performance iam trying to use SAX Is it the right way to do ?

Getting image from word document using SAX parser

No comments:

Post a Comment