Architecture suggestions for ingesting disparate Data Feeds?



Problem description: I have to ingest data feeds from multiple sources and do an ETL on them and house them in an Oracle Data Warehouse. The files themselves are .csv files. The files have different formats, attributes, and values.


Let's say for the sake of argument that I have 3 different types of files:




  • File1 - I get a new file of this type every day. user_id,first_name,last_name




  • File2 - I get a new file of this type every week: order_id,

    order_date, order_amount




  • File3 - I get a new file of this type every month: part_id,

    part_name, part_description




What's the simplest way to ingest this data and feed it to the ETL step? How can I code, implement a solution that I can use for all types of files, even a currently unknown type File4?


I was thinking of XSLT and XPath as a possible solution here. Convert the .csv files to XML and then write an XSLT template for each File type. Am I on the right track here? What other technologies/tools can help me?


Any suggestions greatly appreciated.


No comments:

Post a Comment