XML : PIG script to Parse XML with multiple child tags

I am trying to write a PIG script to parse the below XML file . I am using PIG 0.11.0

Input xml file :

EAN1

A1

REAN1

825307895

REAN2

825307890

2016-01-13T22:12:04.344Z

EAN1

A1

REAN2_1

825307895

REAN2_2

825307890

2016-01-13T22:12:04.344Z

Output :

EAN1 REAN1 825307895

EAN1 REAN2 825307890

I am not able to Parse the XML file successfully.

below pasted is the pig script i am using to parse the xml file.

REGISTER /opt/cloudera/parcels/CDH/lib/pig/piggybank.jar; A = LOAD '/home/etl1/test.fil' using org.apache.pig.piggybank.storage.XMLLoader('recommendation') as (x:chararray); B = foreach A GENERATE FLATTEN(REGEX_EXTRACT_ALL(x,'\s*(.)\s(.)(.)')); dump B;

Output I am receiving is of the the below format

(EAN1,A1, REAN1 825307895REAN2825307890 2016-01-13T22:12:04.344Z) (EAN2,A1, REAN2_1 825307895REAN2_28253078902016-01-13T22:12:04.344Z)

Thanks,

No comments:

Post a Comment