I’m currently working on a project to track products from several websites. I use a python scraper to retrieve all the URLs related to the listed products, and later, regularly check if these URLs are still active.
To do so I use the Python requests module, run a get request and look at the response’s status code. Usually I get 200, 301, 302 or 404 as expected, except in the following case:
This product has been removed and while opening the link (sorry it’s in French), I am briefly shown a placeholder page saying the product is not available anymore and then redirected to the home page (www.sephora.fr).
Oddly, Python still returns a 200 status code and so do various redirect tracers such as wheregoes.com or redirectdetective.com. The worst part is that the response URL still is the original, so I can’t even trace it that way.
When analyzing with Chrome DevTools and preserving the logs, I see that at some point the page is reloaded. However I’m unable to find out where.
As a reference, here’s a link to a working product:
Thank you ! Ludwig
The page has a meta tag, that redirects the page to the root URL:
<meta http-equiv="refresh" content="0; URL=/" />