Sunday, 1 March 2015

XSLT to normalise whitespace and but leave inner HTML



I'm trying to use XSLT to transform XML into a plaintext file for loading into a database. One of the elements I need, however, might contain HTML formatted text which I need to preserve, and newlines and whitespace which I don't. I also don't want the XML namespace.


The file is large and more complicated, but the problem should be covered by the following example.


XML:



<outer xmlns="urn:site-org:v3/m2" >
<inner>
<text>
<p>This is text with markup</p>
<p>This is text with <i>more</i> markup</p>
</text>
</inner>
<inner>
<text>
Need text with no markup also
</text>
</inner>
</outer>


Desired output:



<p>This is text with markup</p><p>This is text with <i>more</i> markup</p>
Need text with no markup also


With an output format of text, normalize-space() cleans up all the newlines and whitespace, but also removes the tags.


I've tried using xml output and xsl:copy-of, but this leaves the line breaks, and the namespace, and character encodes some of my other output (& -> &amp;) which is undesirable.


Thanks in advance for any ideas!


No comments:

Post a Comment