I need to replace all variants of ellipsis with a specific (typographically correct) version. This symbol can occur in just about any text node. It can come as ". . ." or "..." or "…" (HTML entity) with spaces/characters/end of tag before/after it.
The best for layout is as dots and thin-spaces: ". . ."
The following solution finds the all the ellipsis in the test file (and fixes them correctly) EXCEPT when there are multiple ellipsis per node (last para tag). So close, yet so terrifyingly far away.
Due to the inability to see clearly whether or not it changed spaces to thin spaces I added the text line. Obviously in a final solution this would be removed.
XSLT:
<xsl:stylesheet xmlns:xsl="http://ift.tt/tCZ8VR"
xmlns:xs="http://ift.tt/tphNwY"
exclude-result-prefixes="xs"
version="2.0">
<!-- Identity template for all other elements and attributes. -->
<xsl:template match="@*|node()" name="default" mode="#all">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates mode="#current"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()[matches(.,'\.\s?\.\s?\.')]">
<xsl:analyze-string select="." regex="(\w?\.?)\s?\.\s?\.\s?\.\s?(\w?)">
<xsl:matching-substring>
<xsl:if test="regex-group(1)"><xsl:value-of select="regex-group(1)"/><xsl:text> </xsl:text></xsl:if>
<xsl:text>. . .</xsl:text>
<xsl:text>[[FIXED TEXT]]</xsl:text>
<xsl:if test="regex-group(2)"><xsl:text> </xsl:text><xsl:value-of select="regex-group(2)"/></xsl:if>
</xsl:matching-substring>
<xsl:non-matching-substring><xsl:value-of select="current()"/></xsl:non-matching-substring>
</xsl:analyze-string>
<!-- <xsl:value-of select="replace(.,'(\w?\.?)\s?\.\s?\.\s?\.\s?(\w?)', '$1 . . . $2')"/>-->
</xsl:template>
<xsl:template match="text()[matches(.,'…')]">
<xsl:analyze-string select="." regex="(\w?\.?)\s?…\s?(\w?)">
<xsl:matching-substring>
<xsl:if test="regex-group(1)"><xsl:value-of select="regex-group(1)"/><xsl:text> </xsl:text></xsl:if>
<xsl:text>. . .</xsl:text>
<xsl:text>[[FIXED SYM]]</xsl:text>
<xsl:if test="regex-group(2)"><xsl:text> </xsl:text><xsl:value-of select="regex-group(2)"/></xsl:if>
</xsl:matching-substring>
<xsl:non-matching-substring><xsl:value-of select="current()"/></xsl:non-matching-substring>
</xsl:analyze-string>
<!-- <xsl:value-of select="replace(.,'(\w?\.?)\s?…\s?(\w?)', '$1 . . . $2')"/>-->
</xsl:template>
</xsl:stylesheet>
XML Test File:
<?xml version="1.0" encoding="UTF-8"?>
<sec>
<label>1</label><title>Introduction . . . </title>
<p>Ellipsis <italic>Correct</italic> (periods and thin spaces): . . . text</p>
<p>Ellipsis (periods and spaces): . . . text</p>
<p>What about periods. . .with no spaces around?</p>
<p>. . . starts paragraph</p>
<p>text ends paragraph. . . .</p>
<p>This is typical text ending a sentence ending in a period. . . . New sentence</p>
<p>Ellipsis (just periods): ... text</p>
<p>No...spaces around ellipsis.</p>
<p>...No spaces start</p>
<p>ends paragraph....</p>
<p>para end....No spaces</p>
<p>Ellipsis (symbol): … text</p>
<p>Middle of text…with no space</p>
<p>Ellipsis followed by punctuation….</p>
<p>No spaces ending para with period.…</p>
<p>ending para with period and space. …</p>
<p>…Start of paragraph</p>
<p>… Start para with space</p>
<p>end of paragraph…</p>
<p>end of para with space …</p>
<p>Multiple things … within the same . . . paragraph?...to see if it works. ... And what about a ...? Question or ...! Exclamation point?</p>
</sec>
No comments:
Post a Comment