Sunday, 8 February 2015

gawk on windows - retrieving all matches of a string within an XML + 100 bytes before and after matches string + filename



On Windows, I am currently using gawk to find the first occurrence of a string + 100 bytes for all XMLs withing a directory:


gawk "/[some string]/" { match ( $0, /[some string]/); print substr($0,RSTART,RLENGTH + 100) FILENAME; }" C:\XML*.xml > C:\Results.txt


What I would like to do now is output all the matches (not just the first) to C:\Results.txt for each XML and also include 100 characters before the match + 100 characters after the match.


Is it possible to easily change this to get the desired results?


I understand that gawk might not be the best tool for the job, but this is just a one time task and if this is slow I can let this run overnight.


No comments:

Post a Comment