Saturday, 6 December 2014

What's being used to block access to this sitemap?



I've been trying to figure out how Twitter blocks access to their sitemaps and haven't been able to replicate it on my website. I was hoping that I could get a bit of here from people more knowledgeable than I ;)


Here's the location of their sitemap index file: http://ift.tt/1yqruli


If you view that file directly it's showing blank. But search for it in Google and view the cached version of it: http://ift.tt/1yqrulk


Obviously Google is able to see the sitemap. I'm guessing that Twitter doesn't know every single Google IP so I was assuming they did it with the User Agent. However, when I install User Agent Switcher and change to GoogleBot, I'm still not able to view the sitemap.


Does anyone know how exactly this is being done?


No comments:

Post a Comment