Saturday, 28 February 2015

URL gets truncated with httr::GET vs xmlParse



I am trying to request an XML document with two different methods (xmlParse and httr::GET) and expect the response to be the same. The response I get with xmlParse is what I expect but with httr::GET my request URL gets truncated at some point.


An example:



require(httr)
require(XML)
require(rvest)

term <- "alopecia areata"
request <- paste0("http://ift.tt/1GA6rOf",term)

#requesting URL with XML
xml_response <- xmlParse(request)

xml_response %>%
xml_nodes(xpath = "//Result/Term") %>%
xml_text


This returns, as it should



[1] "alopecia areata"


Now for httr



httr_response <- GET(request)
httr_content <- content(httr_response)

httr_content %>%
xml_nodes(xpath = "//Result/Term") %>%
xml_text


This returns



[1] "alopecia"


What's interesting: if we check the httr_response element for the requested URL, it's correct. Only the response is wrong.



> httr_response$request$opts$url

[1] "http://ift.tt/1vJ3ee8 areata"

> httr_response$url

[1] "http://ift.tt/1GA6utv"


So at some point my query term got truncated. If the whole request is put into a browser by hand, it behaves as expected.


Any suggestions how to resolve this would be would be greatly appreciated.


No comments:

Post a Comment