Scrape Walmart search results python

I’m am trying to scrape search results on Walmart.

For example, let’s go to the domain “https://www.walmart.com/search/?query=coffee%20machine

And try to extract just the text from the element with the class name search-product-result, all in python.

I’ve tried selenium and I get asked to verify my identity. I’ve tried requests and I get the forbidden page from Walmart. I’ve tried other libraries and I’m running out of ideas. Any advice?

Answer

The data in this URL is being loaded by JavaScript. So beautifulsoup will not work in this case.

However, the data that the page displays is present as JSON string inside <script> tag with id=searchContent in its HTML Code.

I have extracted that <script> from the HTML code, did some stripping and converted the text to JSON.You can extract whatever data you need from that JSON.

Here is the code that prints the product IDs of the search results.

from bs4 import BeautifulSoup
import requests
import json

headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"}
url = 'https://www.walmart.com/search?query=coffee%20machine'

r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.text, 'lxml')
s = str(soup.find('script', {'id': 'searchContent'}))
s = s.strip('<script id="searchContent" type="application/json"></script>')
j = json.loads(s)
x = j['searchContent']['preso']['items']


for i in x:
    print(i['productId'])

Outputs the product IDs.

2RYLQXVZ80E8
7EYUEQ82RMBP
7A3VDQNS5R36
22GRP3PGSY4A
238DLP3R0M3W
52NMIX2M8SC5
1R4H630LRNSE
.
.
.