The question is published on by Tutorial Guruji team.
I’m trying to automatize a script to download pdf’s I usually receive. If pdf is attached, I have the right program (I suppose).
My problem is when (I think), I receive an HTML embedded in the email, with a URL inside HTML. For example:
This is from spam folder, but it can help us to understand the problem…
I have the following code: mail.py
import pickle,os.path,base64,time from datetime import datetime from googleapiclient.discovery import build from google_auth_oauthlib.flow import InstalledAppFlow from google.auth.transport.requests import Request def get_credentials(token_path,credentials_path,scopes): creds = None if os.path.exists(token_path): with open(token_path, 'rb') as token: creds = pickle.load(token) # If there are no (valid) credentials available, let the user log in. if not creds or not creds.valid: if creds and creds.expired and creds.refresh_token: creds.refresh(Request()) else: flow = InstalledAppFlow.from_client_secrets_file(credentials_path, scopes) creds = flow.run_local_server(port=0) # Save the credentials for the next run with open(token_path, 'wb') as token: pickle.dump(creds, token) return creds def get_labels(service): return service.users() .messages() .list(userId='me',labelIds = labels) .execute() .get('labels',[]) def get_all_messages_id(service,labels=["INBOX"]): return service.users() .messages() .list(userId='me',labelIds = labels) .execute() .get("messages") def get_message(message_id,service): return service.users() .messages() .get(userId='me', id=message_id) .execute() def get_subject_of_message(message): for header in message.get("payload").get("headers"): for k,v in header.items(): if v=='Subject': return header.get("value")
Then, If I use…
>>> service = mail.login("token.pickle","credentials.json") >>> message_id = mail.get_all_messages_id(service)[0] >>> mail.get_message(message_id.get("id"),service)
I’m able to see “Original Xiaomi Mi Band 4 …” in str mode (message_id is ok), but I’m not able to see its URL.
Instead, I can see a very large and ugly string
I think “text/html” tag is blocking me, but I don’t know how I can continue. If I have it in HTML format, with its tags, I can use BeautifulSoup to analyze it. But I have this ugly string…
Does anyone found this problem earlier?
Thanks for your help
PS: If anyone wants to know how I’ve generated token.pickle and credentials.json to repeat it, you can see Google’s API doc, I’ve follow their instructions and it’s so easy.
Answer
That ugly string is base64 encoded content,
all you have to do is to decode and parse it.
Try something like this:
str(base64.urlsafe_b64decode(encoded_string_here), "utf-8")