Python: How can I search a url inside HTML reading Gmail? Code Answer

I’m trying to automatize a script to download pdf’s I usually receive. If pdf is attached, I have the right program (I suppose).

My problem is when (I think), I receive an HTML embedded in the email, with a URL inside HTML. For example:


This is from spam folder, but it can help us to understand the problem…

I have the following code:

import pickle,os.path,base64,time
from datetime import datetime

from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

def get_credentials(token_path,credentials_path,scopes):
    creds = None
    if os.path.exists(token_path):
        with open(token_path, 'rb') as token:
            creds = pickle.load(token)

    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            flow = InstalledAppFlow.from_client_secrets_file(credentials_path, scopes)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open(token_path, 'wb') as token:
            pickle.dump(creds, token)
    return creds

def get_labels(service):
    return service.users()
                  .list(userId='me',labelIds = labels)

def get_all_messages_id(service,labels=["INBOX"]):
    return service.users()
                  .list(userId='me',labelIds = labels)

def get_message(message_id,service):
    return service.users()
                  .get(userId='me', id=message_id)

def get_subject_of_message(message):
    for header in message.get("payload").get("headers"):
        for k,v in header.items():
            if v=='Subject': return header.get("value")

Then, If I use…

 >>> service = mail.login("token.pickle","credentials.json")
 >>> message_id = mail.get_all_messages_id(service)[0]
 >>> mail.get_message(message_id.get("id"),service)

I’m able to see “Original Xiaomi Mi Band 4 …” in str mode (message_id is ok), but I’m not able to see its URL.

Instead, I can see a very large and ugly string

enter image description here

I think “text/html” tag is blocking me, but I don’t know how I can continue. If I have it in HTML format, with its tags, I can use BeautifulSoup to analyze it. But I have this ugly string…

Does anyone found this problem earlier?

Thanks for your help

PS: If anyone wants to know how I’ve generated token.pickle and credentials.json to repeat it, you can see Google’s API doc, I’ve follow their instructions and it’s so easy.


That ugly string is base64 encoded content,

all you have to do is to decode and parse it.

Try something like this:

str(base64.urlsafe_b64decode(encoded_string_here), "utf-8")


