Sunday, 5 April 2015

developing a data warehouse from parsed XML



My purpose is to develop a data warehouse, for which i downloaded an XML file from dblp (computer science bibliography website). I then parsed that XML file using the SAX XML parser.


After parsing i now have the following entities (Closing tags aren't printed intentionally):



<dblp>
<www mdate key>
<author>
<title>
<url>
<year>
<inproceedings mdate key>
<author>
<title>
<month>
<year>
<pages>
<booktitle>
<url>
<note>
<cdrom>
<article mdate key>
<author>
<author>
<title>
<journal>
<volume>
<month>
<year>
<book mdate key>
<author>
<title>
<year>
<publisher>
<isbn>
<url>
<incollection mdate key publtype>
<author>
<author>
<author>
<title>
<year>
<booktitle>
<ee>
<crossref>
<url>
<proceedings mdate key>
<editor>
<editor>
<editor>
<title>
<booktitle>
<volume>
<series href>
<year>
<isbn>
<publisher>
<url>


According to mu knowledge the next step now is to develop a dimensional model (star schema) out of these entities, but i don't understand how exactly i have to do this with this much data. (Note: Its my first data warehouse project)


Is this data enough? What are the things i have to take care of?


What should i do next?


No comments:

Post a Comment