I am extracting html source of a web page using scrapy and save the output in .xml format. The web page source has the following content
<html>
<head>
<script type="text/javascript">var startTime = new Date().getTime(); </script><script type="text/javascript">var startTime = new Date().getTime();</script> <script type="text/javascript"> document.cookie = "jsEnabled=true";..........
...........<div style="margin: 0px">Required content</div>
</head>
</html>
From this I need to remove all .... tags and retain the required content with their respective tags. How can I do that by using scrapy?
No comments:
Post a Comment