How else can I access a marketplace’s product information via code?

In order to scrape products off Ali Express, I can run window.runParams.data in Chrome’s console to access all of the information very easily. Knowing this, I used regex to scrape product information directly from Ali Express’s HTML without having to simulate a million clicks to make the information appear in my screen and only then extract it.

I am trying to do the same thing for another website called Mercado Livre. The thing is, each product can have variations, each of which may or may not have a whole other set of frequently more than 10 images. It’s a lot of images and, unfortunately, I can’t access window.runParams.data as I did for AliExpress. This is the error I get when I try:

VM228:1 Uncaught TypeError: Cannot read property 'data' of undefined
    at <anonymous>:1:18

It probably doesn’t matter but the variations section comes in buttons:

https://produto.mercadolivre.com.br/MLB-1870995603-brinquedos-sensoriais-popit-bubble-fidget-52-pecas-_JM

Or dropdowns:

https://produto.mercadolivre.com.br/MLB-1862560460-kit-brinquedos-sensoriais-fidget-push-pop-it-49-pcs-_JM

What is the easiest way to scrape all of these images’ URLs using Python without having to simulat clicking? I looked at the code but I got very confused because many of the images are shared between variations so using Ctrl + F to look for URLs and try to find the location of each variation was impossible.

All of the thumbnail ones (for example this one) would suffice as I can just replace the R by an F at the end of the URL and it becomes large, like this.

Thank you very much!

Answer

You can do that using requests and beautifulsoup.

Upon clicking a variation of the product, the data is being loaded from an API. You can all the information from that API.

prod_var_id is the variation ID.

https://produto.mercadolivre.com.br/p/api/items?attributes={prod_var_id}&quantity=1&platform=ML&id=MLB1870995603&app=vip

Here I have scraped the links of the variations of the product, extracted the prod_var_id from the link and making a get request to the API by substituting the prod_var_id in the API URL mentioned above.

The prod_var_id is present in the URL after the attributes=

from bs4 import BeautifulSoup
import requests

headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"}
url = 'https://produto.mercadolivre.com.br/MLB-1870995603-brinquedos-sensoriais-popit-bubble-fidget-52-pecas-_JM'

r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.text, 'lxml')
a = soup.find_all('a', class_='ui-pdp-thumbnail ui-pdp-variations--thumbnail ui-pdp-thumbnail--NONE')

prod_var_id = a[0]['href'].split('attributes=')[-1]
api_url = f'https://produto.mercadolivre.com.br/p/api/items?attributes={prod_var_id}&quantity=1&platform=ML&id=MLB1870995603&app=vip'
resp = requests.get(api_url).json()

print(api_url)
https://produto.mercadolivre.com.br/p/api/items?attributes=COLOR_SECONDARY_COLOR:NTJwY3MtMDE=&quantity=1&platform=ML&id=MLB1870995603&app=vip

You can make a request to this API and extract whatever data you need.