XSL normalize-space() is too greedy around embedded tags



I thought this would be simple. Here's my input. I don't have any control over its layout.



<?xml version="1.0" encoding="UTF-8"?>
<topic>
<title>The Torments of Hell</title>
<body>
<p>Life is a <xref href="dungeon.xml">dungeon
</xref> and
an <xref href="abyss.xml">abyss</xref>.
</p>
</body>
</topic>


The output I'm trying to get:



...
Life is a<ref>[[dungeon|dungeon.xml]]</ref> and an <ref>[[abyss|abyss.xml]]</ref>.
...


So the WYSIWYG (the output of a different tool, over which I have no control, and which converts ref tags to footnotes with citations) would look like this:


Life is a dungeon1 and an abyss2.


Here's the xsl I started with:



<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://ift.tt/tCZ8VR"
xmlns:xs="http://ift.tt/tphNwY"
exclude-result-prefixes="xs"
version="2.0">
<xsl:template match="topic">
<xsl:text>&#xa;=</xsl:text>
<xsl:value-of select="title"/>
<xsl:text>=</xsl:text>
<xsl:apply-templates select="body/p"/>
</xsl:template>
<xsl:template match="p">
<xsl:text>&#xa;&#xa;</xsl:text>
<xsl:apply-templates select="node()"/>
</xsl:template>
<xsl:template match="xref">
<xsl:text disable-output-escaping="yes">&lt;ref&gt;</xsl:text>
<xsl:text>[[</xsl:text>
<xsl:value-of select="."/>
<xsl:text>|</xsl:text>
<xsl:value-of select="@href"/>
<xsl:text>]]</xsl:text>
<xsl:text disable-output-escaping="yes">&lt;/ref&gt;</xsl:text>
</xsl:template>
</xsl:stylesheet>


And here's the output I got:



...
Life is a <ref>[[dungeon|dungeon.xml]]</ref> and
an <ref>[[abyss|abyss.xml]]</ref>.
...


No problem, I'll just use normalize-space to get rid of the line breaks:



<xsl:template match="text()">
<xsl:value-of select="normalize-space(.)"/>
</xsl:template>


Now my output looks like this:



...
Life is a<ref>[[dungeon|dungeon.xml]]</ref>and an<ref>[[abyss|abyss.xml]]</ref>.
...


And my WYSIWYG looks like this:


Life is adungeon1and anabyss2.


The linefeed is gone, but so are the spaces both before and after the ref tags; these I would like to have kept. I could just hack it and add a space before and after my ref tags, but then I get this ugliness:


Life is a dungeon1 and an abyss2 .


Notice the space between abyss and the period. I tried the solutions here and here, but those eliminate only extra spaces; they don't help with linefeeds.


I spent all day trying to do this with XSL, with no luck. Then I spent 45 minutes writing a javascript that does exactly what I want. The practical, immediate problem solved, but it seems strange to me that this would be so difficult with XSL. It seems so simple. Is there a way to do this with XSL, or do I need to preprocess the XML before I apply the stylesheet?


No comments:

Post a Comment