how to scrape url's from a website....

Questions about applications and software
Forum rules
Before you post read how to get help. Topics in this forum are automatically closed 6 months after creation.
Locked
cgutzmer

how to scrape url's from a website....

Post by cgutzmer »

is there any kind of software that will scrape a website for any and all URL's on the site? Been searching for a while now not really coming up with much. Any ideas appreciated.
Thanks!
Chris
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 1 time in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
cgutzmer

Re: how to scrape url's from a website....

Post by cgutzmer »

drat - no thoughts? Back to cut and paste I suppose. Keeps me out of trouble anyway!
Chris
cgutzmer

Re: how to scrape url's from a website....

Post by cgutzmer »

Hey all,
I need to ask this again. Let me put a little more info with it though so you understand what I am doing. I own my own small business selling card models. I have a horrible time with pirates stealing my work. Not really my work but the work of all the designers that make the models to sell. I legally represent them. I need to go to a website and go through the entire sites to scrape all the URL's of the illegal files from hosters like hotfile, rapidshare etc etc. I spent about 20 hours working on one site over the last week and bam overnight they are replenished.

I really hope someone can help me find a good solution :(
Thanks
Chris
emorrp1

Re: how to scrape url's from a website....

Post by emorrp1 »

Well, I'm of the opinion that if you make the legal way to get something the most convenient, the vast majority of people will use it rather than the illegal method - online TV for instance. Nevertheless, I found the exercise amusing, so add the following to ~/.bashrc: "function scrape { wget $1 -qO - | sed 's/"/\n"\n/g' | sed '/http/!d'; }", and from then on you'll be able to use e.g. "scrape www.linuxmint.com" to get a list of addresses (not necessarily valid ones though).
emorrp1

Re: how to scrape url's from a website....

Post by emorrp1 »

to add the line to .bashrc, press ALT+F2 and type "gedit .bashrc" into the dialog, then copy and paste the line into it before saving and exiting. You'll need to use the terminal to actually use the command though.
Locked

Return to “Software & Applications”