Suppose I have many html scripts like this:
<div style="clear:both" id="novelintro" itemprop="description">you are foolish!<font color=red size=4>I am superman!</font></div>
I want to use xpath to extract the text: you are foolish! I am superman!
However, if i use
xpath('//div[@id="novelintro"]/text()').extract()
I can only get "you are foolish!"
while I use:
xpath('//div[@id="novelintro"]/font/text()').extract()"
I can only get "I am superman!"
so if you could use only one xpath expression to extract the whole sentence which is "you are foolish! I am superman!"
And more unlucky, in the html script above, it is "<font>" label, but in my other script, there are many other labels, for example:
to extract "hi girl I love you!" in the follow script: <div style="clear:both" id="novelintro" itemprop="description">hi girl<legend >I love you!</legend></div>
to extract "If I marry your mother then I am your father!" in the follow script:
<div style="clear:both" id="novelintro" itemprop="description">If I<legend > marry your mother<div>then I am your father!</div></legend></div>
if you could use only one xpath expression to adapt all of the html scripts?
No comments:
Post a Comment