I have an "XML-like" file that contains a lot of configuration data. I say "XML-like" because it is really like 3 XML files concatenated together, separated with "]]>]]>"
E.g.
<?xml version="1.0" encoding="UTF-8"?>
<hello><world>"Earth"</world></hello>]]>]]><?xml version="1.0" encoding="UTF-8"?>
<data><lemur><type>"Ring-tailed"</type></lemur></data>]]>]]><?xml version="1.0" encoding="UTF-8"?>
<data><lemur><type>"Mouse"</type></lemur></data>]]>]]>
I am trying to write a script that will call xmllint to indent all of the XML tags in the file. However, xmllint (and many other xml formatting programs) seems to require that there be only one XML document in the file. E.g. the file needs to start with "<?xml version="1.0" encoding="UTF-8"?>
" and contain only one root tree.
So I tried writing an awk script that would parse the data into separate chunks and pass it to xmllint, but I am getting an error that I can't get past. I've put the script and the output below.
$ awk '
BEGIN {
RS = "]]>]]>"
xmlFormatCommand = "xmllint --format -"
}
{
print $0 | xmlFormatCommand
}
' SmallTest.xml
-:3: parser error : XML declaration allowed only at the start of the document
<?xml version="1.0" encoding="UTF-8"?>
^
-:4: parser error : Extra content at the end of the document
<data><lemur><type>"Ring-tailed"</type></lemur></data>
^
If I do it in two separate operations, one where awk prints to three temporary files, and one where xmllint operates on those files, then it works.
E.g.
awk 'BEGIN {RS = "]]>]]>"} {print $0 > "Section_" NR ".txt" }' SmallTest.xml
That results in three files Section_1.txt, Section_2.txt, and Section_3.txt. The contents of Section_2.txt are:
$ cat Section_2.txt
<?xml version="1.0" encoding="UTF-8"?>
<data><lemur><type>"Ring-tailed"</type></lemur></data>
I can format that file with xmllint:
$ cat Section_2.txt | xmllint --format -
<?xml version="1.0" encoding="UTF-8"?>
<data>
<lemur>
<type>"Ring-tailed"</type>
</lemur>
</data>
So I don't understand why I can't just pipe it to xmllint in the first place in the awk script.
I appreciate any help you can provide.
-Jon
No comments:
Post a Comment