Idea - how to get email from the internet?
Let’s see, a few things that has been done:
- - Friends word of mouth
- - Massive distribution of flyers at specific area
- - Facebook Fan Page
- - Mudah.my Ads
A few things under progress:
- - Email and official letters to potential customers
- - Massive distribution (lolz, I try to avoid that ’spam’ word) of collective emails
Okay, for the last point, I hope to get your idea on how we can get list of emails from the internet? Any idea? I have one email list from my friend’s collection but the numbers does not seem good enough. A friend suggested Google, but that would be too much as I need a specific site to dig effectively.
So I came to think of crawling some social sites such as Facebook, Myspace or Friendster. Facebook seems hard enough as people don’t usually post their email address plus Facebook also implement restriction for the user as default setting, so it can’t be Facebook. Although Facebook is worth a try (as the email will be always valid), but the only email can be crawled is only our friends list.

Above: A request from a non-programmer friend for it’s visual output… It’s just crappy codes but very useful. Many can code better.
It’s been a while since I login my Friendster account. And I don’t feel like login to Friendster now as I have a better social account, Facebook. So, forget Friendster. Let’s see, Myspace. Hm.. I guess this one may be the potential leak to some emails as most ‘gediks’ gurls/people tend to exchange their email address or yahoo account at the comment portion in the myspace account. So, let’s do some crawler.. Eh. I forgot how to code for a crawler.. It’s been very long time since I code something. Let’s do the adhoc then. Here’s my stupid sneak peak code for the crawler:
def readlink(link):
url = urllib.urlopen(link)
urlread = url.readlines()
for line in urlread:
#disabling urlscanner for the time being
#urlscanner(line)
#crawling_comment_page(line)
#extract_email(line)try:
#working
the_email2 = re.findall(”([\w\.\-]+@[\w\.\-]+)”,line)for item in the_email2:
#check whether it is inside
Item = open(”emails.txt”,”r”).read()
if item not in Item:
print item
#Control to verify whether it is an email
#if item.find(”.”) != -1:
#output to file
.except AttributeError:
passdef urlscanner(perline):
.def crawling_zombie():
.crawling_zombie()
It works. But the collection is still unsatisfied. Any idea where are those emails on the net? Shall I crawl yahoogroups? Ah, maybe I should join yahoogroups for marketing. That’s idea just came through. Hehe.