How to handle none type object while web scrape an html page

I want to scrape a page from rottentomatoes

I tried to find out name of all directors for different movies. So far my code run well. However there is a movie on the webpage named WORLD ON A WIRE. The director’s name is missing for the movie. Now when I run the code it gives me error like NoneType object is not iterable. Now how could I handle the Null fields while scraping web pages.

My code:

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}
url= 'https://editorial.rottentomatoes.com/guide/best-sci-fi-movies-of-all-time/'
r = requests.get(url, headers=headers)#, proxies=proxies)
content = r.content
soup = BeautifulSoup(content)
director = []
for d in soup.find_all('div', attrs={'class': 'info director'}):
    for a in d.find('a'):
        director.append(a)
        print(a)

enter image description here

Answer

d.find('a') in your code is not returning an iterable object, this will causing error in python. You should use find_all('a') instead of find('a').

Your code should looks like:

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}
url= 'https://editorial.rottentomatoes.com/guide/best-sci-fi-movies-of-all-time/'
r = requests.get(url, headers=headers)#, proxies=proxies)
content = r.content
soup = BeautifulSoup(content)
director = []
for d in soup.find_all('div', attrs={'class': 'info director'}):
    for a in d.find_all('a'):
        director.append(a.string)
        print(a.string)

Leave a Reply

Your email address will not be published. Required fields are marked *