Thursday, 21 January 2016

XML : Scraping information in R

I need to obtain some data from a web page. I'm trying to extract using R software.

Cause the information is in several pages firstly I write this code:

  require(XML)  contador<-c(1:200)  for(i in contador){   myURL<-paste("http://www.europa-mop.com/excavadoras-usadas/2-1/anuncios-excavadoras.html?p=",i,sep="")  }    

Secondly, I read the web_url with the following code:

  web_url<-getURL(myURL)  web_url<-readLines(tc<-textConnection(web_url));close(tc)  webtree<-htmlTreeParse(web_url,error=function(...){})  body<-webtree$children$html$children$body  body    

Nevertheless when I execute the following command I obtain an error:

  precio<-xpathSApply(body,"//li[@class='label label-secondary text-bold']",xmlValue)    Input is not proper UTF-8, indicate encoding !  Bytes: 0xC2 0x3C 0x2F 0x64  Sequence ']]>' not allowed in content  Sequence ']]>' not allowed in content  internal error: detected an error in element content    

I've tried different alternatives but I don't get to scrap the information.

Tx for your comments!

No comments:

Post a Comment