I've not worked with XML before, but am having trouble with getting text out of the following XML:
<w>
<shortening>n</shortening>
ūmi
<mor type="mor">
<mw>
[extra stuff]
</mw>
<menx>rest</menx>
<menx>sleep</menx>
<gra type="gra" relation="ROOT" head="0" index="1"/>
</mor>
</w>
It doesn't recognise the text ūmi inside. I think this is because it is preceded by the <shortening> tag. This shouldn't be a Unicode issue, because there are plenty of other Unicode characters that read just fine (this is transliterated Hebrew).
Is there an easy way to fix this? Is this malformed XML?
No comments:
Post a Comment