Importing Inconsistant XML Data into Database



Being the sports nerd that I am, I'm looking to take daily XML files that are produced by the Major League Baseball Website, and import them into either an Access or MySQL database. The issue I'm running into, is that almost every XML file they produce is just slightly different than the last. For example, one game file may have a field named batter23 that is next to event22 while the other file calls it batter24and is next to pitcher25. I know that XML files can be inconsistent, but I know there has to be a way to consistently get the data into a database. Is there anyway to standardize these XML files? Some code that will parse each file in a list, and organize them into a specific style and giving them consistent field names? Currently I import the XML file into a Excel sheet first, where I change the file type to a CSV, but from there the field names and column locations are still different from file to file.


My goal is to have all the files in a structure where I can quickly import them into a database each day, without having to manually change column locations, or field names. I'm open to any and all options, but my experience in most languages are rookie level at best, so forgive me for my lack of knowledge.


No comments:

Post a Comment