Build your own Google Alerts substitute (Python)

Mon 20 May 2013

I loved Google Alerts, until it worked. So, I was kinda forced to create my own notifier (in the fastest possible way) and I wrote a basic python script that for my particular case, does the job pretty well.

The idea is the following: we search Google for our interested keyword/phrase, get the results back, find the total number of results, compare this number to our previous number of results (saved somewhere on a file, or on a sqlite database), if greater: send email and save the bigger number on our file to use it for the next comparison. You can put this script on a crontab or, if in Windows, check Z-cron, and schedule it to run invisibly, say, every day.

After the email notification, I would go to Google and check the results of the last 24 hours to see what’s going on.

Below is the python script that searches for “Nard Ndoka” and sends an email to nardndoka@gmail.com when Google results number increases. To use the script simply change the keyword to be searched and email credentials accordingly.

import urllib2
import re
from bs4 import BeautifulSoup
import smtplib
import os


KEYWORD = "Nard Ndoka"

gmail_user = "nardndoka@gmail.com"
gmail_pwd = "password"
FROM = 'nardndoka@gmail.com'
TO = ['nardndoka@gmail.com']  # must be a list
SUBJECT = "New Alert"

base_path = os.path.dirname(os.path.abspath(__file__))
result_file = os.path.join(base_path, 'searched.txt')
url = "https://www.google.com/search?q=%22" + KEYWORD.replace(" ", "+") + "%22"

def send_email(final):
    TEXT = "Result number was increased for " + KEYWORD + ". Now is " + final
    # Prepare actual message
    message = """\From: %s\nTo: %s\nSubject: %s\n\n%s
    """ % (FROM, ", ".join(TO), SUBJECT, TEXT)
    try:
        server = smtplib.SMTP("smtp.gmail.com", 587)
        server.ehlo()
        server.starttls()
        server.login(gmail_user, gmail_pwd)
        server.sendmail(FROM, TO, message)
        server.close()
        print 'Successfully sent the mail'
    except:
        print "Failed to send mail"


def remove_html_markup(s):
    tag = False
    quote = False
    out = ""

    for c in s:
            if c == '<' and not quote:
                tag = True
            elif c == '>' and not quote:
                tag = False
            elif (c == '"' or c == "'") and tag:
                quote = not quote
            elif not tag:
                out = out + c
    return out


opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
find_html = opener.open(url).read()
soupg = BeautifulSoup(find_html)
results = soupg.findAll("div", {"id": "resultStats"}).__str__()

text = remove_html_markup(results)  # remove html tags
current_result_number = re.sub(r"\D", "", text)  # remove non-digit characters

try:
    with open(result_file, 'r') as f:
        previous_result_number = f.read()
except IOError:
    with open(result_file, 'w+') as f:
        previous_result_number = 0

if int(current_result_number) <= int(previous_result_number):
    print "No new results"
else:
    f = open(result_file, "w")
    f.write(current_result_number)
    f.close()
    print "New results. Sending mail.."
    send_email(current_result_number)