I am getting content of pdf documents through tika and sending it to solr to index it through xml request in coldfusion.
But i am facing many issues:
Issue 1:
An invalid XML character (Unicode: 0xb) was found in the element content of the document
I have used following solution to escape uni code characters and also tried many others
p= createObject("java","java.util.regex.Pattern").compile("[^\\u0009\\u000A\\u000D\u0020-\\uD7FF\\uE000-\\uFFFD\\u10000-\\u10FFF]+"); p.matcher(myText).replaceAll("")
Now i am facing the following error:
A decimal representation must immediately follow the "&#" in a character reference.
Can any one please help me to resolved this.
No comments:
Post a Comment