I am new to R and I am trying to scrape multiple web pages into one data table. I have written some code to automatically write the different urls into characters.
library(XML) library(RCurl) page_numbers <- 1:10 urls <- paste("http://observation.org/user/view/84878?q=&akt=0&g=0&from=2016-01-01&to=2016-12-31&prov=0&z=0&sp=0&gb=0&cdna=0&f=0&m=K&zeker=O&month=0&rows=100&only_hidden=0&zoektext=0&tag=0&plum=0&page", page_numbers, sep = "=") Now I would like to use a for loop (or lapply?) to scrape the different urls and bind it in one data table. The code I wrote to scrape a single page and put it in a data table is written below, but I don't know how to use the for loop (or lapply) to include all urls.
Arjandata <- readHTMLTable("http://observation.org/user/view/84878?q=&akt=0&g=0&from=2016-01-01&to=2016-12-31&prov=0&z=0&sp=0&gb=0&cdna=0&f=0&m=K&zeker=O&month=0&rows=100&only_hidden=0&zoektext=0&tag=0&plum=0&page=1", as.data.frame = TRUE, which=3, skip.rows=2, stringsAsFactors = FALSE) #remove the first row Arjandata <- Arjandata[-1,] #remove empty columns Arjandata$V1 = NULL Arjandata$V3 = NULL Arjandata$V9 = NULL Arjandata$V10 = NULL Arjandata$V13 = NULL Arjandata$V14 = NULL Arjandata$V15 = NULL Arjandata$V16 = NULL Arjandata$V17 = NULL Arjandata$V18 = NULL Arjandata$V19 = NULL Arjandata$V20 = NULL Arjandata$V21 = NULL Arjandata$V22 = NULL Arjandata$V23 = NULL Arjandata$V24 = NULL Arjandata$V25 = NULL #set column names colnames(Arjandata) <- c("date", "time", "number", "appearance", "activity", "species", "area", "PR") Any help is greatly appreciated!
No comments:
Post a Comment