I’m trying to scrape email address from a webpage. The email address is available in page source (ctrl + u). However, I still can’t fetch it using requests. All I get is AttributeError. Any help on this would be appreciated.
My current attempt:
import requests from bs4 import BeautifulSoup link = "https://www.facebook.com/pg/theultimatecollectionco/about/?ref=page_internal" with requests.Session() as s: s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36' r = s.get(link) soup = BeautifulSoup(r.text,"lxml") try: email = soup.select_one("a[href^='mailto:']").get("href") except AttributeError: email = "" print(email)
selenium helps here).
The easiest way is just to grep the page for any
import re import html import requests link = "https://www.facebook.com/pg/theultimatecollectionco/about/?ref=page_internal" html_doc = requests.get(link).text for email in re.findall(r'"mailto:([^"]+)"', html_doc): print(html.unescape(email))