Scraping For Fun And Profit – Emails Galore!

Hey Guys!

So someone asked me if I could lend any advice on how to solve their goal.

It all boiled down to this – they needed a list of email addresses for local businesses advertising on a popular coupon site.

Sound tough? Surprisingly not – and the following video will walk you through how we achieved the task quickly and painlessly, using ScrapeBox. Now now – before everyone goes NUTS and tells be that ScrapeBox is just a tool for comment spam. Let me say this – I don’t condone spam in any way, it’s up to the user to decide how they want to use the tools they have..

ScrapeBox is far more useful as a research tool than a spam tool anyway!

The Process

1. Keywords

Now – this isn’t conventional keyword research – here we want to create a list of keywords that described the businesses we wanted to target on the coupon site. In this case, the keywords selected were the local zip codes. Your goal here should be to get less than 1000 results per keyword – if you’re getting more than that, you may need to use more specific keywords. If you’re not targetting anything specific, just use a generic keyword list, like a, b, c.. aa, ab, ac… aaa, aab, aac.. etc..

2. First Scrape

We combine our keyword list with site:groupon.com to scrape all pages on groupon.com that match our keyword(s). If the email addresses you wanted were on these pages, you could easily run this list through the email scraper and be finished – in our case, the email addresses are on external sites linked to from groupon.com – so we must go through some additional steps.

[like-page]

3. Second Scrape And Cleanup

We take our harvested URL list from step #2 and run it through the Link Extractor addon for ScrapeBox to get all the external links from the pages – this gives us a list of all the sites of the companies that are listed on groupon.com who match our keywords. We’re getting pretty close – but now we want to clean up our list.. We want to trim the URLS back to the root, then remove any duplicates. We do this by loading our list of external links from the previous step into the harvester, then use the trim to root and remove duplicate domains functions to clean up our list.

4. Third Scrape

OK – now we take our list from the previous step and pull it into notepad – what we want to do here is replace the “http://” and “https://” parts of all our links with “site:”. This will allow us to scrape Bing for a list of all the internal pages for each of these sites. Also, we want to reduce the results per keyword to 10 and blank out the footprint.

7. Scrape #4 – Emails!

Last step! We use the “Grab Emails From Harvested URL List” function in ScrapeBox to get all the email addresses that appear on any of the pages in the list of urls we scraped in the previous step – there will be the odd garbage entry, but that only takes a second to tidy up.

Congratulations on making it to the end – If you have any questions about scraping, or anything else – please get in touch – I love helping people out!

[like-page]