MaryTTS - New Language Support, Importing xml dump into MySQL throws PacketTooBig Exception



I have just started working in MaryTTS, an open-source, multilingual text-to-speech synthesis system.


I am currently trying to add support for new language that involves importing a large XML dump (around 670mb) into mysql database. The problem have arised on this step, running the file wkdb_cleaning_up.sh throws com.mysql.jdbc.PacketTooBigException.


I have set the max_allowed_packet=1024M, which have no effect on this.


Full stack trace:



Exception in thread "main" java.io.IOException: com.mysql.jdbc.PacketTooBigException: Packet for query is too large (1151137 > 1048576). You can change this value on the server by setting the max_allowed_packet' variable.
at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:92)
at marytts.tools.dbselection.DBHandler.loadPagesWithMWDumper(DBHandler.java:251)
at marytts.tools.dbselection.WikipediaMarkupCleaner.processWikipediaPages(WikipediaMarkupCleaner.java:1044)
at marytts.tools.dbselection.WikipediaProcessor.main(WikipediaProcessor.java:365)
Caused by: org.xml.sax.SAXException: com.mysql.jdbc.PacketTooBigException: Packet for query is too large (1151137 > 1048576). You can change this value on the server by setting the max_allowed_packet' variable.
at org.mediawiki.importer.XmlDumpReader.endElement(XmlDumpReader.java:227)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:88)
... 3 more


Any folks here, who have some idea over MaryTTS?


Apologies for poor English by the way. Let me know if any details are still missing in question. I will add more if needed.


No comments:

Post a Comment