persist/read xml content in ZIP maintaining the correct encoding



A desktop application stores its data in a xml file. This xml file, together with other files gets stored in a Zip-Archive to persist its state. I´ve always been at daggers drawn with encoding, but this time I really don´t understand, why its not working. Following problem:


When I persist the Data in the xml file, everything seems fine. I log the output and all encodings are correct. I can open the Zip with other tools checking for the XML and the encoding is fine there as well, but as soon as I try to read it in again in my Java Application, the encoding gets messed up, like for example german umlauts are not correct anymore.


Following code is used to read in the xml from the zip:



private String readZipArchive( final Map<String, Image> imageMap, final Path path ) throws ZipException,
IOException
{
String xmlData = null;

try (ZipFile zipFile = new ZipFile( path.toFile(), StandardCharsets.UTF_8 ))
{
final Enumeration<? extends ZipEntry> zipEntryEnum = zipFile.entries();

while ( zipEntryEnum.hasMoreElements() )
{
final ZipEntry zipEntry = zipEntryEnum.nextElement();

logger.debug( "zipEntry: " + zipEntry + " comment: " + zipEntry.getComment() );

switch ( FileType.valueOf( zipEntry.getComment() ) )
{
case DATA:

xmlData = convertStreamToString( zipFile.getInputStream( zipEntry ) );

//Here the String is not UTF 8, why? German Umlauts are broken:
logger.dev( "Load State from File: \n" + xmlData );

break;

case PICTURE:
//OTHER Implementation, not important.
break;
}
}

return xmlData;
}
}


private static String convertStreamToString( final InputStream is )
{
try (Scanner s = new Scanner( is, "UTF-8" ))
{
s.useDelimiter( "\\A" );
return s.hasNext() ? s.next() : "";
}
}


Is anyone able to see, what my mistake is here, or can explain, how I can maintain the correct UTF-8 encoding?


No comments:

Post a Comment