XML : How can I parse xml into a dataframe in R?

I am using a SOAP WS, which provides its responses in XML. I am fairly new to XML, and I still cant quite get my head around it. It seems to me the server responses are very inconsistent, but I am not well versed enough to verify that.

For example, for the first function, I got a response and used 'xmlParse' to provide the following result:

  > xmlParse(theResult$value())  <?xml version="1.0" encoding="utf-8"?>  <soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">    <soap:Body>      <GetPortfoliosResponse xmlns="http://rixtrema.net/">        <GetPortfoliosResult>          <string>Blah</string>          <string>Blah Blah</string>          <string>Blah Blah Blah</string>        </GetPortfoliosResult>        <outString/>      </GetPortfoliosResponse>    </soap:Body>  </soap:Envelope>    

So to get this into a structure I can use, I use:

  > unlist(unname(theResult$Body$GetPortfoliosResponse$GetPortfoliosResult))   [1] "Blah"                                          [2] "Blah Blah"                               [3] "Blah Blah Blah"            

For another request, I get the following response:

  > xmlParse(theResult$value())  <?xml version="1.0" encoding="utf-8"?>  <soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">    <soap:Body>      <UniversalFieldGroupingResponse xmlns="http://rixtrema.net/">        <UniversalFieldGroupingResult>          <xs:schema xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" id="NewDataSet">            <xs:element name="NewDataSet" msdata:IsDataSet="true" msdata:MainDataTable="RESULT" msdata:Locale="en-US">              <xs:complexType>                <xs:choice minOccurs="0" maxOccurs="unbounded">                  <xs:element name="RESULT" msdata:CaseSensitive="False" msdata:Locale="en-US">                    <xs:complexType>                      <xs:sequence>                        <xs:element name="Value" type="xs:double" minOccurs="0"/>                        <xs:element name="Name" type="xs:string" minOccurs="0"/>                        <xs:element name="ID" type="xs:double" minOccurs="0"/>                      </xs:sequence>                    </xs:complexType>                  </xs:element>                </xs:choice>              </xs:complexType>            </xs:element>          </xs:schema>          <diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1">            <DocumentElement xmlns="">              <RESULT diffgr:id="RESULT1" msdata:rowOrder="0" diffgr:hasChanges="inserted">                <Value>0</Value>                <Name>USD Curncy</Name>                <ID>1</ID>              </RESULT>              <RESULT diffgr:id="RESULT2" msdata:rowOrder="1" diffgr:hasChanges="inserted">                <Value>0</Value>                <Name>IBM</Name>                <ID>2</ID>              </RESULT>              <RESULT diffgr:id="RESULT3" msdata:rowOrder="2" diffgr:hasChanges="inserted">                <Value>0</Value>                <Name>AAPL</Name>                <ID>3</ID>              </RESULT>            </DocumentElement>          </diffgr:diffgram>        </UniversalFieldGroupingResult>      </UniversalFieldGroupingResponse>    </soap:Body>  </soap:Envelope>    

And to get this into a structure I can use, I use the following:

  > xmlToDataFrame(getNodeSet(xmlParse(theResult$value()), "//RESULT"))     Value                                    Name ID  1      0                              USD Curncy  1  2      0                                     IBM  2  3      0                                    AAPL  3    

So these two structures look very different to me. And it is causing quite a headache, as it seems for each particular response I must create some specific code to parse the xml structure into something I can use. It seems this is bad practice on the part of the people administrating the API. Am I right? Or is this normal to have several different response structures, and to have to beat your head against the wall parsing them every time? Or is there a better way to do this?

These are just two examples. As it currently stands, I have about 5 requests, each with different methods for parsing the response.

No comments:

Post a Comment