How do I prevent character escaping in PHP XSLT?



I have a large, complex XML document that needs to be transformed twice with XSLT to achieve the desired result. Thanks to michael's answer here, I got it to work perfectly when running the transformation locally (with xsltproc on the Terminal), but I can't get it to work the same way now that I've moved it to PHP.


The important part of the source document looks like this:



<BiographicalNote>
&lt;p&gt;This text includes escaped HTML entities.&lt;/p&gt;
</BiographicalNote>


And the desired output:



<ParagraphStyleRange>
<CharacterStyleRange>
<Content>
This text includes escaped HTML entities.
</Content>
</CharacterStyleRange>
</ParagraphStyleRange>


I need to do this transformation in two steps because I need to unescape the HTML entities and then process them as XML to apply further transformations. So the first step takes care of the bulk of the transformation, setting up the end document, but when it comes to sections with HTML is only does this:



<xsl:template match="BiographicalNote">
<AuthorBio>
<xsl:value-of select="normalize-space(.)" disable-output-escaping="yes" />
</AuthorBio>
</xsl:template>


Then I have a second XSLT to get rid of the <p> tags and do various other transformations with the <em>, <b> and <span> tags:



<xsl:template match="AuthorBio">
<ParagraphStyleRange>
<xsl:apply-templates select="./node()"/>
<Br/>
</ParagraphStyleRange>
</xsl:template>

<xsl:template match="p/text()|text()"> <!-- Not all HTML paragraphs have actual <p> tags. -->
<CharacterStyleRange>
<Content><xsl:value-of select="."/></Content>
</CharacterStyleRange>
</xsl:template>


As I mentioned above, this works perfectly when I transform it locally with xsltproc. But with PHP, what I'm getting is this:



<ParagraphStyleRange>
<CharacterStyleRange>
<Content>
&lt;p&gt;This text includes escaped HTML entities.&lt;/p&gt;
</Content>
</CharacterStyleRange>
</ParagraphStyleRange>


I've tried a few variations on the PHP code, but this is what's gotten my closest:



function processONIX () {
// Load ONIX file
$onix = new DOMDocument;
$onix->load( 'HarbourONIX20141217.xml' );

// Load ONIXtoICML XSL file
$icml = new DOMDocument;
$icml->load( 'ONIXtoICML.xsl' );

// Configure the transformer
$icmlproc = new XSLTProcessor;
$icmlproc->importStyleSheet($icml);

// Apply ONIXtoICML
return $icmlproc->transformToDoc($onix);
}

function makeICML () {
$temp = processONIX();

// Load Inlines XSL file
$inline = new DOMDocument;
$inline->load( 'Inlines.xsl' );

// Configure the transformer
$inlineproc = new XSLTProcessor;
$inlineproc->importStyleSheet($inline);

// Apply Inlines
$inlineproc->transformToURI($temp, 'ONIX.icml');
}

makeICML();



  • This gives the result as shown above, with all the HTML bits escaped.

  • Using transformToXML in processONIX() gives me nothing.

  • Using transformToURI in processONIX successfully saves the file, and the HTML tags are correctly escaped: <AuthorBio><p>This text includes escaped HTML entities.</p></AuthorBio> But when I try to load it as a new DOMDocument in makeICML() (same way as in processONIX() ) it seems to load as an empty document. No idea what's happening here.


The issue seems to be in the makeICML() function, but that seems strange since it's almost exactly the same as processONIX(), only with different variable and file names. I can't figure out at which point the characters are being escaped, so: how can I prevent it?


No comments:

Post a Comment