I am trying to use spark-xml like below to read all files in a folder:
val df = sparkSession .read .format("com.databricks.spark.xml") .schema(customSchema) .option("rootTag", "Transactions") .option("rowTag", "Transaction") .load("/Users/spark/Desktop/sample")
And inside the sample folder, there are X amount of xml files.
base on the customSchema I provided, each file will become 1..n rows base on the # of transaction tags. But what I want is also to include the xml file name as an extra column with each record.
I searched the spark-xml github options but seems no ideal results.
Please give suggestions or maybe I could achieve the goal using a different method?
Thanks,
No comments:
Post a Comment