XML : Python XPath : Is it possible to have optional XPath query?

i have the following way of parsing an xml

  import re  from lxml.html.soupparser import fromstring    inString = """  <doc>    <q></q>    <p1>      <p2 dd="ert" ji="pp">            <p3>1</p3>          <p3>2</p3>          <p3>32</p3>          <p3>3</p3>         </p2>         <p2 dd="ert" ji="pp">            <p3>4</p3>          <p3>5</p3>          <p3>ABC</p3>          <p3>6</p3>         </p2>    </p1>  <r></r>  <p1>      <p2 dd="ert" ji="pp">            <p3>7</p3>          <p3>8</p3>          <p3>ABC</p3>          <p3>9</p3>         </p2>         <p2 dd="ert" ji="pp">            <p3>10</p3>          <p3>11</p3>          <p3>XYZ</p3>          <p3>12</p3>         </p2>    </p1>  </doc>  """  root = fromstring(inString)    #nodes = root.xpath("./doc//p1/p2/p3[contains(text(),'ABC') or contains(text(),'XYZ')]/preceding-sibling::p3")    ns = {"re": "http://exslt.org/regular-expressions"}  nodes = root.xpath(".//p3[re:match(.,'XYZ') or re:match(.,'ABC')]/preceding-sibling::p3", namespaces=ns)    

which gives me

  4 5 7 8 10 11    

so it completely skips the first <p2> my ideal output is

  1 2 32 3 4 5 7 8 10 11    

so, if i cant find a <p3>ABC<p3> or <p3>XYZ<p3> in a <p2>, i still want all the <p3> s of that <p2>. is that possible?

No comments:

Post a Comment