Thursday, 3 December 2015

XML : parsing xml using xml2 package dropping row

I am trying to parse some xml using the xml2 package in R.

I am sure that the way that I am looping thru the xml file is what is causing the issue, but I just can't seem to figure out why my result is dropping one row of data from each group of data in the xml. I also assume that using some type of apply function would make the code more efficient, I just didn't know how to go that route.

here is my code:

  library(magrittr)    #initialize web query filters and put them into a data frame  fieldName <- "Report%20Date"  operatorType <- "BETWEEN"  values <- c("11/30/2015", "12/01/2015")  df <- data.frame(fieldName=fieldName,                   operatorType=operatorType,                   values=I(list(values)),                   stringsAsFactors = FALSE)    #convert the query filters into JSON  request.body <- jsonlite::toJSON(df)    #build the url  target_url <- paste("http://mpr.datamart.ams.usda.gov/ws/report/v1/cattle/LM_CT100?",                      "filter={%22filters%22%20:",                      request.body,                      "}",                      sep="")      result <- httr::GET(target_url) %>%    httr::content(as="text") %>%    xml2::read_xml()    #get all of the reports  rpts <- xml2::xml_find_all(result, './/record//report')    #initialize an empty data frame  LM_CT100 <- data.frame()    for(report_looper in 1:length(rpts)){    # get all the <record>s    recs <- xml2::xml_find_all(rpts[report_looper], ".//record")      #initialize an empty data frame    tmp_df <- data.frame()      # loop through the rest of the records to extract the data    for(record_looper in 2:length(recs)){      newRow <- recs[record_looper] %>% xml2::xml_attrs() %>% unlist() %>% data.frame() %>% t()      tmp_df <- rbind(tmp_df, newRow)        # remove the row names      rownames(tmp_df) <- NULL      } # close record_looper loop      #bind the dataframes together    LM_CT100 <- rbind(LM_CT100, tmp_df)  } # close report_looper loop    

The resulting data frame, LM_CT100 only has 38 rows, but the original xml has 40 rows.

In addition, I need to extract the "report_date" for each group of data and have that as a column in my data frame.

to make it easy to see the xml that is returned from the query, this is the link that will return the results in your browser.

No comments:

Post a Comment