Saturday, 21 November 2015

XML : basic questions about processing big XML files

I have some basic questions about processing big XML files (ca 25mb) in a CronTab. I have now written a number of various scrpits do do this, and all of them should work well, but when i test them in a browser, some .. "errors" occure

e.g.:

.. the browser is requesting/loading a while and then suddenly stops (white screen), but when I refresh my DB I can see data still coming in, so the process is still running.

.. after a while I can see via DB-refresh that no more data is coming in and that the script has stopped after some 900-1400 processed items (one process includes a api-call (client and server are on the same host) and a image upload.. sometimes there is nothing in the LOG at that point, sometimes a

(104)Connection reset by peer: mod_fcgid: error reading data from FastCGI server (104)Connection reset by peer: mod_fcgid: ap_pass_brigade failed in handle_request_ipc function

I have already asked my hoster, to set the fcgid and memory/timeout values to the max, but still none of my scripts run until every xml-node is processed.. when a script "crashes" and I run it again, it runs well until another ~1000 nodes, then it crushes again.. so when I would make cron, that runs the script 5-7 times - ca 10-15 min offset - then all the nodes should be in my DB.. but that can't be the answer :/

these are my settings

  FcgidIdleTimeout 60  FcgidIdleScanInterval 30  FcgidConnectTimeout 30  FcgidMaxProcesses 600  FcgidBusyTimeout 1800  FcgidBusyScanInterval 90  FcgidZombieScanInterval 3  FcgidErrorScanInterval 12  FcgidProcessLifeTime 120  FcgidPassHeader Authorization  FcgidSpawnScoreUpLimit 50  FcgidSpawnScore 1  FcgidTerminationScore 2  FcgidMaxRequestLen 1073741824  FcgidMaxRequestsPerProcess 100000  FcgidMinProcessesPerClass 0  FcgidMaxProcessesPerClass 8  FcgidIOTimeout 1800    

I have already tried defferent ways, ..

-read the whole XML.. -pre-chunk in small xml-files and stream one after one ..
- I have used the API for data-import - I have put the data into the DB by myself with PDO

..but whatecer I try, the server seems to crash/timeout..

so my first question ist -> is there a difference between, when i test my script in a browser and when it will be triggered via crontab? (I would like to get status-emails about how many modes got processed, but when the script doesnt run till the end.. this won't work)

and second question -> what else can try? :/

No comments:

Post a Comment