Wget download all pages from an XML sitemap



I thought that this question would be a relatively easy to find a solution to but for some reason, the answers I've tried don't work.


I'm trying to simply use wget and download/mirror all of the links on my XML sitemap with the following command:


wget --quiet http://ift.tt/1MpqinF --output-document - | egrep -o "http://ift.tt/1AcTUhF" | wget --spider -i - --wait 0


But for some reason I just see a bunch of


Spider mode enabled. Check if remote file exists. --2015-02-16 12:49:33-- http://ift.tt/1MpqkvC Reusing existing connection to mytestdomain.com:80. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Remote file exists and could contain further links, but recursion is disabled -- not retrieving.


I'm not a CLI pro so I have no idea why it isn't actually downloading the actual page into a static.html file.


So my question is, how can I modify the command above so that it will download all of the links in the XML into static.html files?


Thanks


No comments:

Post a Comment