I am trying to write a PIG script to parse the below XML file . I am using PIG 0.11.0
Input xml file :
EAN1
A1
REAN1
825307895
REAN2
825307890
2016-01-13T22:12:04.344Z
EAN1
A1
REAN2_1
825307895
REAN2_2
825307890
2016-01-13T22:12:04.344Z
Output :
EAN1 REAN1 825307895
EAN1 REAN2 825307890
I am not able to Parse the XML file successfully.
below pasted is the pig script i am using to parse the xml file.
REGISTER /opt/cloudera/parcels/CDH/lib/pig/piggybank.jar; A = LOAD '/home/etl1/test.fil' using org.apache.pig.piggybank.storage.XMLLoader('recommendation') as (x:chararray); B = foreach A GENERATE FLATTEN(REGEX_EXTRACT_ALL(x,'\s*(.)\s(.)(.)')); dump B;
Output I am receiving is of the the below format
(EAN1,A1, REAN1 825307895REAN2825307890 2016-01-13T22:12:04.344Z) (EAN2,A1, REAN2_1 825307895REAN2_28253078902016-01-13T22:12:04.344Z)
Thanks,
No comments:
Post a Comment