Before I ask this question I will provide some info on what I am actually trying to do:
I need to refactor a large amount of GSP files in my grails project. After I tried writing my own groovy script for that -- and realizing that it is way too much for my current skill level in any language -- I found this article, which helped me a lot with parsing html content.
After a while I had put together my own script to parse an html file, serialize it again and save it to a new file. This is my script:
import groovy.xml.*
@Grab(group='org.ccil.cowan.tagsoup',module='tagsoup', version='1.2' )
def tagsoupParser = new org.ccil.cowan.tagsoup.Parser()
tagsoupParser.setFeature(tagsoupParser.namespacesFeature, false)
def slurper = new XmlSlurper(tagsoupParser)
def xmlFile = 'list.gsp'
def htmlParser = slurper.parse(xmlFile)
/*
TODO: Manipulation code goes here
*/
def outputBuilder = new StreamingMarkupBuilder()
String result = XmlUtil.serialize(outputBuilder.bind{ mkp.yield htmlParser })
result = result.replaceAll(/<\?.+\?>/, '')
def newFile = new File('neu.html')
newFile.text = result
Note that I do not want an XML prolog in my GSP files; therefore, I remove it using regex (that is not my question, but if anybody knows a more "groovy" way to do this, please let me know!)
Also, I set namespacesFeature to false, since namespaces are useless for my purpose.
Because that worked like a charm with HTML files, I thought I am ready to loop over my folder recursively and find all GSP files with the name list.gsp and refactor them automatically. But when I tried to test it with one list.gsp, the serialization fails because of the unbound prefix g for the element g:set:
The prefix "g" for element "g:set" is not bound.
Now, I kind of understand that what I am trying to do is not the regular purpose of XML parsing and serializing. But in my case, I do not only want the to disable the namespaces feature, but also want the parser to ignore all GSP tags and treat them as regular opening and closing tags; in other words, ignore the double dots in any tag.
The other thing I am concerned of is expression language, such as <%@ page import="<class>" %>. Right now I'm just getting the exception mentioned earlier, but this will probably need to be resolved as well.
Any help is highly apreciated.
No comments:
Post a Comment