Python: Parsing XML with lxml



I am trying to use a library called dblp-python. This library parses the DBLP data (which is in XML format). While I am trying to print all publications of an author, the script acts weirdly. Sometimes it prints them without any errors, and sometimes it shows an error. If I run the same code more than once after an error, it shows the publications without any problem. The code I use is:



a = dblp.search('Michael L. Littman')
for i in range(len(a[0].publications)):
print i
print a[0].publications[i].title


The error I get when I execute the above code is:



> Traceback (mostrecent call last): File "<pyshell#217>", line 3, in <module>
> print a[0].publications[i].title File "build\bdist.win32\egg\dblp\__init__.py", line 19, in __getattr__
> self.load_data() File "build\bdist.win32\egg\dblp\__init__.py", line 110, in load_data
> root = etree.fromstring(xml) File "lxml.etree.pyx", line 3092, in lxml.etree.fromstring (src\lxml\lxml.etree.c:70691) File
> "parser.pxi", line 1828, in lxml.etree._parseMemoryDocument
> (src\lxml\lxml.etree.c:106689) File "parser.pxi", line 1716, in
> lxml.etree._parseDoc (src\lxml\lxml.etree.c:105478) File
> "parser.pxi", line 1086, in lxml.etree._BaseParser._parseDoc
> (src\lxml\lxml.etree.c:100105) File "parser.pxi", line 580, in
> lxml.etree._ParserContext._handleParseResultDoc
> (src\lxml\lxml.etree.c:94543) File "parser.pxi", line 690, in
> lxml.etree._handleParseResult (src\lxml\lxml.etree.c:96003) File
> "parser.pxi", line 620, in lxml.etree._raiseParseError
> (src\lxml\lxml.etree.c:95050) XMLSyntaxError: Space required after the
> Public Identifier, line 2, column 47


The code of the library can be seen HERE. I raised this problem to the author but without response. I hope of anyone can help me here at least to know what the error might be. Thank you


No comments:

Post a Comment