Recently I came across an project and found a method which aims to convert special characters to corresponding HTML/XML character entities for displaying.
The method is simple, which just replace all special characters in the source string(under UTF-8 encoding) with their first code point(use codePointAt(0) method) plus prefix "&#" and suffix ";" using regular expression.
I have done some other tests using this conversion and the results all turn out to be right.
As I found a lot of discussion about how to convert special characters to HTML/XML character entities in Java, some of which even involves using third-party libraries. So I guess If UTF-8 format of the source string can be obtained, the conversion can be simply done by extracting the first code point?
No comments:
Post a Comment