XML : Spark Xml read including filename

I am trying to use spark-xml like below to read all files in a folder:

  val df = sparkSession    .read    .format("com.databricks.spark.xml")    .schema(customSchema)    .option("rootTag", "Transactions")    .option("rowTag", "Transaction")    .load("/Users/spark/Desktop/sample")    

And inside the sample folder, there are X amount of xml files.

base on the customSchema I provided, each file will become 1..n rows base on the # of transaction tags. But what I want is also to include the xml file name as an extra column with each record.

I searched the spark-xml github options but seems no ideal results.

Please give suggestions or maybe I could achieve the goal using a different method?

Thanks,

No comments:

Post a Comment