wget help

Forum rules
Before you post please read how to get help

wget help

Postby xibalba on Sun Mar 18, 2012 4:38 pm


http://pics.livejournal.com/blue_rouge/ ... 7bd?page=2
... (page 3, page 4, page 5,...)
http://pics.livejournal.com/blue_rouge/ ... bd?page=27

I'm new to BASH scripting, and I'm trying to figure out how I'll rip all the scans in these directories.

^As you can see, they aren't actually named as JPEGs, though they are jpegs. So I'm wondering how exactly I'd tell wget to get all the scans, and only the scans, over all 27 pages of livejournal.
Level 2
Level 2
Posts: 87
Joined: Wed Jan 05, 2011 6:07 pm

Linux Mint is funded by ads and donations.

Re: wget help

Postby xenopeek on Sun Mar 18, 2012 5:50 pm

In this case, using something like http://www.httrack.com/ will probably be easier if you do this often.

Anyway, as this is pretty similar to something I just did, so here is a somewhat altered script that will download the 427 images into a folder called Images. The images will be numbered sequentially, from 001 to 427. I can't explain it, but for the second image (http://pics.livejournal.com/blue_rouge/pic/0002xq7r/g11) the webserver always gives the first image instead. You'll have to fetch that second image manually (which strangely does work). Also, some images are not on the webserver any more, like the ninth image (http://pics.livejournal.com/blue_rouge/pic/000344wh/g11). You will see errors for that in the output as the script runs.

To run the script, create a new text file, edit it with Gedit, but the following content into it, save it. Then right-click the file > Properties > Permissions > Allow executing file as program. Then double-click it to run it, and choose "Run" (not "Run in Terminal"). It will open its own terminal window, and that one remains open until you close it (so you can check for any errors).
Code: Select all

# if the script was not launched from a terminal, restart it from a terminal
if [[ ! -t 0 ]] && [[ -x /usr/bin/x-terminal-emulator ]]; then
   /usr/bin/x-terminal-emulator -e "bash -c \"$0 $*; read -s -p 'Press enter to continue...'\""

# only use newlines for splitting strings into arrays

# clean
if [[ -d Images ]]; then
   rm --force --recursive Images/*
   mkdir Images   

# fetch images
while true; do
   wget --quiet --output-document=$tempfile $URL
   if [[ $? != 0 ]]; then
      echo "$(basename $0): wget $URL ($COUNT) failed with exit status $?"
      exit 1
   IMGURL=$(grep " alt='untitled picture'>" $tempfile | sed "s/^                <a href='//" | sed "s/' alt='untitled picture'>$//")
   IMGFILE=$(printf "Images/%03d" $COUNT)
   wget --quiet --output-document=$IMGFILE $IMGURL
   if [[ $? != 0 ]]; then
      echo "$(basename $0): wget $IMGURL from $URL ($COUNT) failed with exit status $?"
      echo "$IMGURL ($COUNT)"
   if [[ $(grep ">next picture</a>" $tempfile | wc -l) == 0 ]]; then
      echo "done"
   URL=$(grep ">next picture</a>" $tempfile | tr "<" "\n" | grep ">next picture" | sed 's/^a href="//' | sed 's/">next picture//')
   COUNT=$(( COUNT + 1 ))
rm $tempfile
User avatar
Level 23
Level 23
Posts: 17786
Joined: Wed Jul 06, 2011 3:58 am
Location: The Netherlands

Return to Scripts & Bash

Who is online

Users browsing this forum: No registered users and 2 guests