Monday, 11 August 2014

Encoding::CompatibilityError: Input must be UTF-8 or US-ASCII, ISO-8859-1 given



I'm trying to use escape_utils via xmlhasher:


http://ift.tt/176lJJe http://ift.tt/1ym2qHs


I'm unzipping an XML file directly in memory through rubyzip which apparently is not UTF-8 but ISO-8859-1:



content = entry.get_input_stream.read.encode('UTF-8', invalid: :replace, undef: :replace, replace: '?')
puts content.encoding # returns UTF-8


Problem is that when I try to run:



test = XmlHasher.parse(content)


I get an error:



expected no Exception, got #<Encoding::CompatibilityError: Input must be UTF-8 or US-ASCII, ISO-8859-1 given> with backtrace:
# /Users/x/.rvm/gems/ruby-2.1.2/gems/xmlhasher-0.0.6/lib/xmlhasher/handler.rb:48:in `unescape_html'
# /Users/x/.rvm/gems/ruby-2.1.2/gems/xmlhasher-0.0.6/lib/xmlhasher/handler.rb:48:in `escape'
# /Users/x/.rvm/gems/ruby-2.1.2/gems/xmlhasher-0.0.6/lib/xmlhasher/handler.rb:21:in `attr'
# /Users/x/.rvm/gems/ruby-2.1.2/gems/xmlhasher-0.0.6/lib/xmlhasher/parser.rb:12:in `sax_parse'
# /Users/x/.rvm/gems/ruby-2.1.2/gems/xmlhasher-0.0.6/lib/xmlhasher/parser.rb:12:in `parse'


I don't get it, the string should now be UTF-8, but I still get this error.


What can I do to move forward? Any pointers are appreciated, thanks.


No comments:

Post a Comment