I am trying to parse www.amazon.com source HTML as follows using xml minidom.
def start_parser(self, analysis_id, url):
dom = None
path = self.create_analysis_folder(analysis_id)
self.get_generated_html(url)
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith('.html'):
dom = parseString(open(path + '/' +file).read())
shutil.rmtree(os.getcwd())
break
return dom
The method does some basic folder manipulations and then calls parseString giving it the html source code. I get the following error on execution.
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 20, column 20
Can someone please explain what that means and how to get rid of it.
No comments:
Post a Comment