XML : read multiple web pages in R

I am new to R and I am trying to scrape multiple web pages into one data table. I have written some code to automatically write the different urls into characters.

  library(XML)  library(RCurl)  page_numbers <- 1:10  urls <- paste("http://observation.org/user/view/84878?q=&akt=0&g=0&from=2016-01-01&to=2016-12-31&prov=0&z=0&sp=0&gb=0&cdna=0&f=0&m=K&zeker=O&month=0&rows=100&only_hidden=0&zoektext=0&tag=0&plum=0&page",     page_numbers,     sep = "=")    

Now I would like to use a for loop (or lapply?) to scrape the different urls and bind it in one data table. The code I wrote to scrape a single page and put it in a data table is written below, but I don't know how to use the for loop (or lapply) to include all urls.

  Arjandata <- readHTMLTable("http://observation.org/user/view/84878?q=&akt=0&g=0&from=2016-01-01&to=2016-12-31&prov=0&z=0&sp=0&gb=0&cdna=0&f=0&m=K&zeker=O&month=0&rows=100&only_hidden=0&zoektext=0&tag=0&plum=0&page=1",                      as.data.frame = TRUE,                      which=3,                      skip.rows=2,                      stringsAsFactors = FALSE)    #remove the first row  Arjandata <- Arjandata[-1,]      #remove empty columns  Arjandata$V1 = NULL  Arjandata$V3 = NULL  Arjandata$V9 = NULL  Arjandata$V10 = NULL  Arjandata$V13 = NULL  Arjandata$V14 = NULL  Arjandata$V15 = NULL  Arjandata$V16 = NULL  Arjandata$V17 = NULL  Arjandata$V18 = NULL  Arjandata$V19 = NULL  Arjandata$V20 = NULL  Arjandata$V21 = NULL  Arjandata$V22 = NULL  Arjandata$V23 = NULL  Arjandata$V24 = NULL  Arjandata$V25 = NULL    #set column names  colnames(Arjandata) <- c("date", "time", "number", "appearance", "activity", "species", "area", "PR")    

Any help is greatly appreciated!

No comments:

Post a Comment