Groovy XML: How to parse, modify and serialize the content of a GSP file. (org.xml.sax.SAXParseException, prefix not bound)



Before I ask this question I will provide some info on what I am actually trying to do:


I need to refactor a large amount of GSP files in my grails project. After I tried writing my own groovy script for that -- and realizing that it is way too much for my current skill level in any language -- I found this article, which helped me a lot with parsing html content.


After a while I had put together my own script to parse an html file, serialize it again and save it to a new file. This is my script:



import groovy.xml.*

@Grab(group='org.ccil.cowan.tagsoup',module='tagsoup', version='1.2' )

def tagsoupParser = new org.ccil.cowan.tagsoup.Parser()
tagsoupParser.setFeature(tagsoupParser.namespacesFeature, false)

def slurper = new XmlSlurper(tagsoupParser)
def xmlFile = 'list.gsp'
def htmlParser = slurper.parse(xmlFile)

/*

TODO: Manipulation code goes here

*/

def outputBuilder = new StreamingMarkupBuilder()
String result = XmlUtil.serialize(outputBuilder.bind{ mkp.yield htmlParser })

result = result.replaceAll(/<\?.+\?>/, '')

def newFile = new File('neu.html')

newFile.text = result


Note that I do not want an XML prolog in my GSP files; therefore, I remove it using regex (that is not my question, but if anybody knows a more "groovy" way to do this, please let me know!)


Also, I set namespacesFeature to false, since namespaces are useless for my purpose.


Because that worked like a charm with HTML files, I thought I am ready to loop over my folder recursively and find all GSP files with the name list.gsp and refactor them automatically. But when I tried to test it with one list.gsp, the serialization fails because of the unbound prefix g for the element g:set:


The prefix "g" for element "g:set" is not bound.


Now, I kind of understand that what I am trying to do is not the regular purpose of XML parsing and serializing. But in my case, I do not only want the to disable the namespaces feature, but also want the parser to ignore all GSP tags and treat them as regular opening and closing tags; in other words, ignore the double dots in any tag.


The other thing I am concerned of is expression language, such as <%@ page import="<class>" %>. Right now I'm just getting the exception mentioned earlier, but this will probably need to be resolved as well.


Any help is highly apreciated.


No comments:

Post a Comment