R crashing when using foreach doParallel and XML



I am writing a function that scrapes a huge XML file. Since the nodes are about one million I would like to use the foreach package. Code and relevat comment below



xmlfile=xmlParse(file="./DATI/xmldata.xml") #read the file

#explore
if(exists("dbfromxml")) rm(dbfromxml)


root<-xmlRoot(xmlfile)
persons<-xmlChildren(root)
rm(root)
nrecords<-xmlSize(persons)

#set out the parallel framework

library(foreach)
library(doParallel)
cores <- getOption("mc.cores", detectCores())
cl<-makeCluster(cores,outfile="ciao.txt")
registerDoParallel(cl)
dbfromxml<-foreach(i=1:10,.combine=rbind,.packages = "XML") %dopar% {
personsxml<-persons[[i]]
processaXMLPersona(personsxml) #this function works properly ouside a do parallel environment
}
stopCluster(cl)


The problem arise when I set out the doParallel / makeCluster infrastructure loading .packages = "XML" that makes the R cluster crashing. The following errors come out: Error in unserialize(socklist[[n]]) : error reading from connection; Error in serialize(data, node$con) : error writing to connection


No comments:

Post a Comment