how to scrape url's from a website....

Questions about applications and software
Forum rules
Before you post please read this

how to scrape url's from a website....

Postby cgutzmer on Fri Jan 01, 2010 11:08 pm

is there any kind of software that will scrape a website for any and all URL's on the site? Been searching for a while now not really coming up with much. Any ideas appreciated.
Thanks!
Chris
cgutzmer
Level 1
Level 1
 
Posts: 21
Joined: Tue Dec 29, 2009 7:10 pm

Linux Mint is funded by ads and donations.
 

Re: how to scrape url's from a website....

Postby cgutzmer on Sat Jan 02, 2010 11:52 am

drat - no thoughts? Back to cut and paste I suppose. Keeps me out of trouble anyway!
Chris
cgutzmer
Level 1
Level 1
 
Posts: 21
Joined: Tue Dec 29, 2009 7:10 pm

Re: how to scrape url's from a website....

Postby cgutzmer on Fri Jan 08, 2010 7:34 am

Hey all,
I need to ask this again. Let me put a little more info with it though so you understand what I am doing. I own my own small business selling card models. I have a horrible time with pirates stealing my work. Not really my work but the work of all the designers that make the models to sell. I legally represent them. I need to go to a website and go through the entire sites to scrape all the URL's of the illegal files from hosters like hotfile, rapidshare etc etc. I spent about 20 hours working on one site over the last week and bam overnight they are replenished.

I really hope someone can help me find a good solution :(
Thanks
Chris
cgutzmer
Level 1
Level 1
 
Posts: 21
Joined: Tue Dec 29, 2009 7:10 pm

Re: how to scrape url's from a website....

Postby emorrp1 on Sat Jan 09, 2010 6:32 am

Well, I'm of the opinion that if you make the legal way to get something the most convenient, the vast majority of people will use it rather than the illegal method - online TV for instance. Nevertheless, I found the exercise amusing, so add the following to ~/.bashrc: "function scrape { wget $1 -qO - | sed 's/"/\n"\n/g' | sed '/http/!d'; }", and from then on you'll be able to use e.g. "scrape www.linuxmint.com" to get a list of addresses (not necessarily valid ones though).
If you have a question that has been answered and solved, then please edit your original post and put a [SOLVED] at the end of your subject header
Hint - use a google search including the search term site:forums.linuxmint.com
emorrp1
Level 8
Level 8
 
Posts: 2322
Joined: Thu Feb 26, 2009 8:58 pm

Re: how to scrape url's from a website....

Postby emorrp1 on Sun Jan 10, 2010 3:39 pm

to add the line to .bashrc, press ALT+F2 and type "gedit .bashrc" into the dialog, then copy and paste the line into it before saving and exiting. You'll need to use the terminal to actually use the command though.
If you have a question that has been answered and solved, then please edit your original post and put a [SOLVED] at the end of your subject header
Hint - use a google search including the search term site:forums.linuxmint.com
emorrp1
Level 8
Level 8
 
Posts: 2322
Joined: Thu Feb 26, 2009 8:58 pm


Return to Software & Applications

Who is online

Users browsing this forum: XP_refugee and 17 guests