404 status code while making HTTP request via Python’s “requests” library. However page is loading fine in browser

I am trying to web scrape the content of few of the websites. But I noticed that for some of the websites I am getting the response with status code as 200. However, for some other of them I am getting 404 status code with the response. But when I am opening these websites (returning 404) in the browser, it is loading fine for me. What am I missing here?

For example:

import requests

url_1 = "https://www.transfermarkt.com/jumplist/startseite/wettbewerb/GB1"
url_2 = "https://stackoverflow.com/questions/36516183/what-should-i-use-instead-of-urlopen-in-urllib3"

page_t = requests.get(url_2)
print(page_t.status_code)      #Getting a Not Found page and  404 status

page = requests.get(url_1)
print(page.status_code)       #Getting a Valid HTML page and 200 status

Answer

The website you mentioned is checking for "User-Agent" in the request’s header. You can fake the "User-Agent" in your request by passing the dict object with Custom Headers in your requests.get(..) call. It’ll make it look like it is coming from the actual browser and you’ll receive the response.

For example:

>>> import requests
>>> url = "https://www.transfermarkt.com/jumplist/startseite/wettbewerb/GB1"
>>> headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}

# Make request with "User-Agent" Header
>>> response = requests.get(url, headers=headers)
>>> response.status_code
200   # success response

>>> response.text  # will return the website content