xpath select node texts and child nodes



I am using pthon scrapy to scrape some data from a website.


the web site content is something like this



<html>
<div class="details">
<div class="a"> not needed</div>
content 1
<p>content 2</p>
<div>content 2</div>
<p>content 2</p>
<div>content 2</div>
<p>content 2</p>
<div class="b"> this is also not needed</div>
</div>
</html>


I need to get the full html data excluding div with class a,b.


so my output will be like this



<div class="details">
content 1
<p>content 2</p>
<div>content 2</div>
<p>content 2</p>
<div>content 2</div>
<p>content 2</p>
</div>


How can I write correct xpath for that or should I write xpath for div with class 'details','a','b 'and use string operations to remove div with class 'a','b'?


Note that here content is the text of and is not a child of div with class 'details'


No comments:

Post a Comment