Remove unnecessary tag content from html source using scrapy

I am extracting html source of a web page using scrapy and save the output in .xml format. The web page source has the following content


<html> 
    <head>
       <script type="text/javascript">var startTime = new Date().getTime();        </script><script type="text/javascript">var startTime = new Date().getTime();</script> <script type="text/javascript"> document.cookie = "jsEnabled=true";..........

...........<div style="margin: 0px">Required content</div>
</head>
</html>

From this I need to remove all .... tags and retain the required content with their respective tags. How can I do that by using scrapy?

Remove unnecessary tag content from html source using scrapy

No comments:

Post a Comment