How to use xpath to extract text in more than one label of html scripts



Suppose I have many html scripts like this:


<div style="clear:both" id="novelintro" itemprop="description">you are foolish!<font color=red size=4>I am superman!</font></div>


I want to use xpath to extract the text: you are foolish! I am superman!


However, if i use


xpath('//div[@id="novelintro"]/text()').extract()


I can only get "you are foolish!"


while I use:


xpath('//div[@id="novelintro"]/font/text()').extract()"


I can only get "I am superman!"


so if you could use only one xpath expression to extract the whole sentence which is "you are foolish! I am superman!"


And more unlucky, in the html script above, it is "<font>" label, but in my other script, there are many other labels, for example:


to extract "hi girl I love you!" in the follow script: <div style="clear:both" id="novelintro" itemprop="description">hi girl<legend >I love you!</legend></div>


to extract "If I marry your mother then I am your father!" in the follow script:


<div style="clear:both" id="novelintro" itemprop="description">If I<legend > marry your mother<div>then I am your father!</div></legend></div>


if you could use only one xpath expression to adapt all of the html scripts?


No comments:

Post a Comment