how to extract web pages and xmls from index.pld_2.gz?

I am very much confused as i could not able to undertand properly how should i do this.i have downloaded the index.pld_2.gz (PLD) from http://ift.tt/1rr2Utk and now wanting to extract so that i can implement page rank algorithm and clustering for my project. I am also using python 3.3.3 but i unable to understand that how to properly do those tasks... Somewhere i also found this http://ift.tt/1y1XHgl but i failed how to implement this either i should have to do using cmd style or other?

how to extract web pages and xmls from index.pld_2.gz?

No comments:

Post a Comment