httr - parsing xml not as text, but still specifying encoding

I'm trying to scrape a website encoded in UTF-8 using the httr package, but apparently the content function of that package only allows for specifying the encoding if you parse the website as text. Unfortunately, I cannot parse it as text, since I would like to use xpath queries on it afterwards. Here's an example:


library(XML)
library(httr)

page <- GET("http://ift.tt/1eo9TeY")
test <- content(page, as = "parsed")
# Get a list of names, many of which contain non-standard characters
xpathSApply(test, "//img", xmlGetAttr, "alt") 

# This gives the correct encoding, but outputs a character vector, 
# on which I cannot use xpath queries
test <- content(page, as = "text", encoding = "utf-8")

httr - parsing xml not as text, but still specifying encoding

No comments:

Post a Comment