Change text inside specific XML Tags (get rid of non-numeric chars) with an arbitraty tool



Ok, I searched a lot, but it became a little frustrating as I found no really working solution for this. So sorry, if this is a stupid question but I'm stuck right now. The task is to get rid of any non-numeric character in the following CustomerIdentity tag inside an XML file:



<ns2:TaxAtSource institutionID="#SG">
<ns2:CantonID>SG</ns2:CantonID>
<ns2:CustomerIdentity>CHE123.456 </ns2:CustomerIdentity>
</ns2:TaxAtSource>


I tried sed (which would be elegant, but as the nonnumeric characters can be anywhere between CustomerIdentity tag, is getting a bit hairy with the regex. I also tried xslt, but the namespace ns2 is making troubles in identifying the tag (nonreferenced namespace). So if anyone has a working solution to process the xml file to look as follows (the rest should be unchanged):



<ns2:TaxAtSource institutionID="#SG">
<ns2:CantonID>SG</ns2:CantonID>
<ns2:CustomerIdentity>123456</ns2:CustomerIdentity>
</ns2:TaxAtSource>


this would be very much appreciated. A collegue suggested using AWK or ruby, but this boils down on regex too I think.


Thanks a lot for any help. Chris


No comments:

Post a Comment