XML : R: XML into dataframe problems with format

I am not familiar with importing XML files into R. I have looked into the existing questions regarding this topic but could not find one that seems to fit. I am thankful for your comments!

My problem is the following: I have an XML file with the following structure

  <?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>  <PublicationData>      <Products numberOfProducts="2094">          <Product terminationReason="" isSalePermission="false" soldoutDeadline="" exhaustionDeadline="" name="Résanol Trio " wNbr="6016" id="7034">              <ProductInformation>                  <ProductCategory primaryKey="7282"/>                  <ProductCategory primaryKey="7282"/>                  <FormulationCode primaryKey="6486"/>                  <DangerSymbol primaryKey="6513"/>                  <DangerSymbol primaryKey="6509"/>                  <CodeS primaryKey="6145"/>                  <CodeS primaryKey="6117"/>                  <CodeS primaryKey="6039"/>                  <CodeS primaryKey="6057"/>                  <CodeS primaryKey="6066"/>                  <CodeS primaryKey="6106"/>                  <CodeS primaryKey="6076"/>                  <CodeS primaryKey="6088"/>                  <CodeR primaryKey="5977"/>                  <CodeR primaryKey="5943"/>                  <CodeR primaryKey="5945"/>                  <CodeR primaryKey="5948"/>                  <CodeR primaryKey="6020"/>                  <PermissionHolderKey primaryKey="10115"/>                  <Ingredient additionalTextPrimaryKey="" inGrammPerLitre="" inPercent="7.5">                      <SubstanceType xsi:type="xs:string" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">ACTIVE_INGREDIENT</SubstanceType>                      <Substance primaryKey="898"/>                  </Ingredient>                  <Ingredient additionalTextPrimaryKey="" inGrammPerLitre="" inPercent="40.0">                      <SubstanceType xsi:type="xs:string" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">ACTIVE_INGREDIENT</SubstanceType>                      <Substance primaryKey="338"/>                  </Ingredient>                  <Ingredient additionalTextPrimaryKey="" inGrammPerLitre="" inPercent="15.0">                      <SubstanceType xsi:type="xs:string" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">ACTIVE_INGREDIENT</SubstanceType>                      <Substance primaryKey="190"/>                  </Ingredient>                  <Indication expenditureTo="" expenditureForm="8.0" waitingPeriod="" dosageTo="" dosageFrom="0.5">                      <Measure primaryKey="6518"/>                      <ApplicationArea primaryKey="3"/>                      <ApplicationComment primaryKey="868"/>                      <Culture additionalTextPrimaryKey="" primaryKey="9953"/>                      <Pest type="PEST_FULL_EFFECT" additionalTextPrimaryKey="" primaryKey="10506"/>                      <Pest type="PEST_FULL_EFFECT" additionalTextPrimaryKey="6964" primaryKey="10508"/>                      <Pest type="PEST_FULL_EFFECT" additionalTextPrimaryKey="" primaryKey="10507"/>                      <Pest type="PEST_PARTIAL_EFFECT" additionalTextPrimaryKey="" primaryKey="10533"/>                      <Obligation primaryKey="12317"/>                      <Obligation primaryKey="11380"/>                      <Obligation primaryKey="9156"/>                      <Obligation primaryKey="9735"/>                      <Obligation primaryKey="9906"/>                  </Indication>              </ProductInformation>          </Product>    

I am trying to extract the information in the "ProductInformation" Node and tried

  xmlfile<-xmlParse("filepath")  relevant<-xpathApply(xmlfile,"//*/Products/Product")  relevant2<-sapply(relevant,xmlValue)    

Which just gives me something like

  [1] "ACTIVE_INGREDIENTACTIVE_INGREDIENT"                                                                                                        [...] "ACTIVE_INGREDIENT"                                                                                                                         [2094] "ACTIVE_INGREDIENT"    

Using instead

  relevant2<-sapply(relevant,xmlAttrs)    

If was also not able to extract the information.

The answer to my problem is probably obvious but I cannot figure the answer out. Thanks for your help!

No comments:

Post a Comment