I am attempting to extract model data from NOAA using readHTMLTable. The table I am trying to get has multiple subtitles, where each subtitle consists of a single cell spanning all columns, as far as I can tell from the HTML. For some reason, this is causing readHTMLTable to omit the row immediately following the subtitle. Here's code that will reproduce the issue:
library(XML)
url <- "http://ift.tt/YNaTIS"
ncep.tables = readHTMLTable(url, header=TRUE)
#Find the list of real time models
for(ncep.table in ncep.tables) {
if("grib filter" %in% names(ncep.table) & "gds-alt" %in% names(ncep.table)) {
rt.tbl <- ncep.table
}
}
#Here's where the problem is:
cat(paste(rt.tbl[["Data Set"]][15:20], collapse = "\n"))
#On the website, there is a model called "AQM Daily Maximum"
#between Regional Models and AQM Hourly Surface Ozone
#but it's missing now...
So, if you go to http://ift.tt/YNaTIS and look at the central table (the one with "Data Set" in the top right cell), you'll see a subtitle called "Regional Models." The AQM Daily Maximum model immediately below the subtitle is skipped during the extraction in the code above.
I maintain the rNOMADS package in R, so if I can get this to work it will save me loads of time maintaining the package as well as keep it accurate and up to date for my users. Thank you for your help!
No comments:
Post a Comment