wget help

Forum rules
Before you post please read this

wget help

Postby xibalba on Sun Mar 18, 2012 4:38 pm

http://pics.livejournal.com/blue_rouge/

http://pics.livejournal.com/blue_rouge/gallery/0000b7bd
http://pics.livejournal.com/blue_rouge/ ... 7bd?page=2
... (page 3, page 4, page 5,...)
http://pics.livejournal.com/blue_rouge/ ... bd?page=27

I'm new to BASH scripting, and I'm trying to figure out how I'll rip all the scans in these directories.

http://pics.livejournal.com/blue_rouge/pic/00031py9
^As you can see, they aren't actually named as JPEGs, though they are jpegs. So I'm wondering how exactly I'd tell wget to get all the scans, and only the scans, over all 27 pages of livejournal.
xibalba
Level 2
Level 2
 
Posts: 90
Joined: Wed Jan 05, 2011 6:07 pm

Linux Mint is funded by ads and donations.
 

Re: wget help

Postby xenopeek on Sun Mar 18, 2012 5:50 pm

In this case, using something like http://www.httrack.com/ will probably be easier if you do this often.

Anyway, as this is pretty similar to something I just did, so here is a somewhat altered script that will download the 427 images into a folder called Images. The images will be numbered sequentially, from 001 to 427. I can't explain it, but for the second image (http://pics.livejournal.com/blue_rouge/pic/0002xq7r/g11) the webserver always gives the first image instead. You'll have to fetch that second image manually (which strangely does work). Also, some images are not on the webserver any more, like the ninth image (http://pics.livejournal.com/blue_rouge/pic/000344wh/g11). You will see errors for that in the output as the script runs.

To run the script, create a new text file, edit it with Gedit, but the following content into it, save it. Then right-click the file > Properties > Permissions > Allow executing file as program. Then double-click it to run it, and choose "Run" (not "Run in Terminal"). It will open its own terminal window, and that one remains open until you close it (so you can check for any errors).
Code: Select all
#!/bin/bash

# if the script was not launched from a terminal, restart it from a terminal
if [[ ! -t 0 ]] && [[ -x /usr/bin/x-terminal-emulator ]]; then
   /usr/bin/x-terminal-emulator -e "bash -c \"$0 $*; read -s -p 'Press enter to continue...'\""
   exit
fi

# only use newlines for splitting strings into arrays
IFS=$'\n'

# clean
if [[ -d Images ]]; then
   rm --force --recursive Images/*
else
   mkdir Images   
fi

# fetch images
tempfile=$(tempfile)
URL="http://pics.livejournal.com/blue_rouge/pic/0002wtq6/g11"
COUNT=1
while true; do
   wget --quiet --output-document=$tempfile $URL
   if [[ $? != 0 ]]; then
      echo "$(basename $0): wget $URL ($COUNT) failed with exit status $?"
      exit 1
   fi
   IMGURL=$(grep " alt='untitled picture'>" $tempfile | sed "s/^                <a href='//" | sed "s/' alt='untitled picture'>$//")
   IMGFILE=$(printf "Images/%03d" $COUNT)
   wget --quiet --output-document=$IMGFILE $IMGURL
   if [[ $? != 0 ]]; then
      echo "$(basename $0): wget $IMGURL from $URL ($COUNT) failed with exit status $?"
   else
      echo "$IMGURL ($COUNT)"
   fi
   if [[ $(grep ">next picture</a>" $tempfile | wc -l) == 0 ]]; then
      echo "done"
      break
   fi
   URL=$(grep ">next picture</a>" $tempfile | tr "<" "\n" | grep ">next picture" | sed 's/^a href="//' | sed 's/">next picture//')
   COUNT=$(( COUNT + 1 ))
done
rm $tempfile
User avatar
xenopeek
Level 21
Level 21
 
Posts: 14524
Joined: Wed Jul 06, 2011 3:58 am
Location: The Netherlands


Return to Scripts & Bash

Who is online

Users browsing this forum: No registered users and 3 guests