Tuesday, 18 October 2016

XML : regex to return all text between tags if text contains specific characters (Notepad++)

I am working on a large xml file with units of the following structure:

  <TrU>  <CrD>16122013, 11:54:13  <CrU>IK  <ChD>16122013, 11:54:13  <ChU>IK  <Seg L=EN-GB>some text in English  <Seg L=RU-RU>some text in Russian  </TrU>    

I need a regular expression that would find such complete structures only if between the tags <TrU> and </TrU> occurs any of the following characters:

íèé

The expression to find such structures without the specific character criterium is: <TrU>.*?</TrU>

I modified it into: <TrU>.*?[íèé].*?</TrU>

but it is greedy and finds multiple, neighbourings units at a time usually only 1 of which contains one of the desired characters.

No comments:

Post a Comment