Using WGet to archive - Earth Empires Forum

119

Member
145

May 17th 2011, 6:21:14

I'm trying to archive a website/blog and all of its contents. I tried several different methods but all of them didn't turn out quite as I was hoping, and the wget man is quite long and extensive.

Each time I tried to wget site it just downloads the index page. The site pages are setup as so:

http://www.site.com
http://www.site.com/page/1
http://www.site.com/page/2
.
. // pages keep going for a while
.

Each page has a post that should also be downloaded. I don't need to downloads the links to external sites.. just the pages on that domain.

So far I have tried

wget -m http://www.site.com
wget -r -l inf -k -E -p -nc http://www.site.com

Anyone know the proper options that I should use to get this working right?

Edited By: 119 on May 17th 2011, 6:27:32
See Original Post

Jiman

Member
1199

May 17th 2011, 6:23:13

Good idea Bobby.

Crippler ICD

Member
3752

May 17th 2011, 7:56:59

i think you need to wget the index into a string then parse the string for the content that you want.

Crippler
FoCuS
<--MSN
58653353
CripplerTD

[14:26] <enshula> i cant believe im going to say this
[14:26] <enshula> crippler is giving us correct netting advice

Azz Kikr

Wiki Mod
1520

May 17th 2011, 18:39:35

http://www.dheinemann.com/2011/archiving-with-wget/

119

Member
145

May 20th 2011, 14:56:20

Thanks for the suggestions. I wasn't able to figure it out, but HTTrack seems to be working well for me now.