How can I extract the node names for fragmented XML document using ruby?



I an XML-like document which is pre-processed by a system out of my control. The format of the document is like this:



<template>
Hello, there <RECALL>first_name</RECALL>. Thanks for giving me your email.
<SETPROFILE><NAME>email</NAME><VALUE><star/></VALUE></SETPROFILE>. I have just sent you something.
</template>


However, I only get as a text string what is between the <template> tags.


My problem: I would like to be able to extract without specifying the tags ahead of time when parsing. I can do this with the Crack ruby gem but only if the tags are at the end of the string and there is only one.


With Crack, I can put a string like string = "<SETPROFILE><NAME>email</NAME><VALUE>go@go.com</VALUE></SETPROFILE>" and my output from Crack is the following:


{"SETPROFILE"=>{"NAME"=>"email", "VALUE"=>"go@go.com"}}


Then I can use a ruby case statement for the possible values I care about.


Question: given that I need to a) have multiple in the string and they cannot be at the end of the string, how can I parse out the node names and the values easily, similar to what I do with crack?


ADDENDUM:


These tags also need to be removed. I would like to continue to use the excellent suggestion from @TinMan found here: how do I remove substring from a string in ruby?


It works perfectly once I know the name of the tag. The number of tags will be finite. I send the tag to the appropriate method once I know it, but it needs to get parsed out easily first.


No comments:

Post a Comment