how to scrape url's from a website....

Questions about applications and software
Forum rules
Before you post please read how to get help
cgutzmer
Level 1
Level 1
Posts: 21
Joined: Tue Dec 29, 2009 7:10 pm

how to scrape url's from a website....

Postby cgutzmer » Fri Jan 01, 2010 11:08 pm

is there any kind of software that will scrape a website for any and all URL's on the site? Been searching for a while now not really coming up with much. Any ideas appreciated.
Thanks!
Chris

cgutzmer
Level 1
Level 1
Posts: 21
Joined: Tue Dec 29, 2009 7:10 pm

Re: how to scrape url's from a website....

Postby cgutzmer » Sat Jan 02, 2010 11:52 am

drat - no thoughts? Back to cut and paste I suppose. Keeps me out of trouble anyway!
Chris

cgutzmer
Level 1
Level 1
Posts: 21
Joined: Tue Dec 29, 2009 7:10 pm

Re: how to scrape url's from a website....

Postby cgutzmer » Fri Jan 08, 2010 7:34 am

Hey all,
I need to ask this again. Let me put a little more info with it though so you understand what I am doing. I own my own small business selling card models. I have a horrible time with pirates stealing my work. Not really my work but the work of all the designers that make the models to sell. I legally represent them. I need to go to a website and go through the entire sites to scrape all the URL's of the illegal files from hosters like hotfile, rapidshare etc etc. I spent about 20 hours working on one site over the last week and bam overnight they are replenished.

I really hope someone can help me find a good solution :(
Thanks
Chris

emorrp1
Level 8
Level 8
Posts: 2281
Joined: Thu Feb 26, 2009 8:58 pm

Re: how to scrape url's from a website....

Postby emorrp1 » Sat Jan 09, 2010 6:32 am

Well, I'm of the opinion that if you make the legal way to get something the most convenient, the vast majority of people will use it rather than the illegal method - online TV for instance. Nevertheless, I found the exercise amusing, so add the following to ~/.bashrc: "function scrape { wget $1 -qO - | sed 's/"/\n"\n/g' | sed '/http/!d'; }", and from then on you'll be able to use e.g. "scrape www.linuxmint.com" to get a list of addresses (not necessarily valid ones though).
If you have a question that has been answered and solved, then please edit your original post and put a [SOLVED] at the end of your subject header
Hint - use a google search including the search term site:forums.linuxmint.com

emorrp1
Level 8
Level 8
Posts: 2281
Joined: Thu Feb 26, 2009 8:58 pm

Re: how to scrape url's from a website....

Postby emorrp1 » Sun Jan 10, 2010 3:39 pm

to add the line to .bashrc, press ALT+F2 and type "gedit .bashrc" into the dialog, then copy and paste the line into it before saving and exiting. You'll need to use the terminal to actually use the command though.
If you have a question that has been answered and solved, then please edit your original post and put a [SOLVED] at the end of your subject header
Hint - use a google search including the search term site:forums.linuxmint.com


Return to “Software & Applications”

Who is online

Users browsing this forum: Bing [Bot] and 15 guests