How can I, scrape data from a Javascript Content of a website? Code Answer

Hello Developer, Hope you guys are doing great. Today at Tutorial Guruji Official website, we are sharing the answer of How can I, scrape data from a Javascript Content of a website? without wasting too much if your time.

The question is published on by Tutorial Guruji team.

Actually, I am trying to fetch the content from the Product Description from the Nykaa Website.

URL:- https://www.nykaa.com/nykaa-skinshield-matte-foundation/p/460512?productId=460512&pps=1&skuId=460502

This is the URL, and in the section of the Product description, clicking upon the ‘Read More’ button, at the end there is some text.

The Text which, I want to extract is :

Explore the entire range of Foundation available on Nykaa. Shop more Nykaa Cosmetics products here.You can browse through the complete world of Nykaa Cosmetics Foundation . Alternatively, you can also find many more products from the Nykaa SkinShield Anti-Pollution Matte Foundation range.

Expiry Date: 15 February 2024

Country of Origin: India

Name of Mfg / Importer / Brand: FSN E-commerce Ventures Pvt Ltd

Address of Mfg / Importer / Brand: 104 Vasan Udyog Bhavan Sun Mill Compound Senapati Bapat Marg, Lower Parel, Mumbai City Maharashtra – 400013

After inspecting the page, when I, ‘disable the javascript’ all the content from ‘product description’ vanishes off. It means the content is loading dynamically with the help of Javascript.

I have used ‘selenium’ for this purpose. And This, is what I have tried.

from msilib.schema import Error
from tkinter import ON
from turtle import goto
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import numpy as np
from random import randint
import pandas as pd
import requests
import csv

browser = webdriver.Chrome(
    r'C:Userspaart.wdmdriverschromedriverwin3297.0.4692.71chromedriver.exe')

browser.maximize_window()  # For maximizing window
browser.implicitly_wait(20)  # gives an implicit wait for 20 seconds

browser.get(
    "https://www.nykaa.com/nykaa-skinshield-matte-foundation/p/460512?productId=460512&pps=1&skuId=460502")


# Creates "load more" button object.
browser.implicitly_wait(20)
loadMore = browser.find_element_by_xpath(xpath="/html/body/div[1]/div/div[3]/div[1]/div[2]/div/div/div[2]")

loadMore.click()
browser.implicitly_wait(20)

desc_data = browser.find_elements_by_class_name('content-details')

for desc in desc_data:
    para_details = browser.find_element_by_xpath(
        './/*[@id="content-details"]/p[1]').text
    extra_details = browser.find_elements_by_xpath(
        './/*[@id="content-details"]/p[2]', './/*[@id="content-details"]/p[3]', './/*[@id="content-details"]/p[4]', './/*[@id="content-details"]/p[5]').text
    print(para_details, extra_details)

And this, is the output which is displaying.

PS E:Web Scraping - Nykaa> python -u "e:Web Scraping - Nykaascrape_nykaa_final.py"
e:Web Scraping - Nykaascrape_nykaa_final.py:16: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
  browser = webdriver.Chrome(

DevTools listening on ws://127.0.0.1:1033/devtools/browser/097c0e11-6f2c-4742-a2b5-cd05bee72661
e:Web Scraping - Nykaascrape_nykaa_final.py:28: DeprecationWarning: find_element_by_* commands are deprecated. Please use find_element() instead
  loadMore = browser.find_element_by_xpath(
[9312:4972:0206/110327.883:ERROR:ssl_client_socket_impl.cc(996)] handshake failed; returned -1, SSL error code 1, net_error -101
[9312:4972:0206/110328.019:ERROR:ssl_client_socket_impl.cc(996)] handshake failed; returned -1, SSL error code 1, net_error -101
Traceback (most recent call last):
  File "e:Web Scraping - Nykaascrape_nykaa_final.py", line 28, in <module>
    loadMore = browser.find_element_by_xpath(
  File "C:Python310libsite-packagesseleniumwebdriverremotewebdriver.py", line 520, in find_element_by_xpath
    return self.find_element(by=By.XPATH, value=xpath)
  File "C:Python310libsite-packagesseleniumwebdriverremotewebdriver.py", line 1244, in find_element    
    return self.execute(Command.FIND_ELEMENT, {
  File "C:Python310libsite-packagesseleniumwebdriverremotewebdriver.py", line 424, in execute
    self.error_handler.check_response(response)
  File "C:Python310libsite-packagesseleniumwebdriverremoteerrorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div[1]/div/div[3]/div[1]/div[2]/div/div/div[2]"}
  (Session info: chrome=97.0.4692.99)
Stacktrace:
Backtrace:
        Ordinal0 [0x00FDFDC3+2555331]
        Ordinal0 [0x00F777F1+2127857]
        Ordinal0 [0x00E72E08+1060360]
        Ordinal0 [0x00E9E49E+1238174]
        Ordinal0 [0x00E9E69B+1238683]
        Ordinal0 [0x00EC9252+1413714]
        Ordinal0 [0x00EB7B54+1342292]
        Ordinal0 [0x00EC75FA+1406458]
        Ordinal0 [0x00EB7976+1341814]
        Ordinal0 [0x00E936B6+1193654]
        Ordinal0 [0x00E94546+1197382]
        GetHandleVerifier [0x01179622+1619522]
        GetHandleVerifier [0x0122882C+2336844]
        GetHandleVerifier [0x010723E1+541697]
        GetHandleVerifier [0x01071443+537699]
        Ordinal0 [0x00F7D18E+2150798]
        Ordinal0 [0x00F81518+2168088]
        Ordinal0 [0x00F81660+2168416]
        Ordinal0 [0x00F8B330+2208560]
        BaseThreadInitThunk [0x76C9FA29+25]
        RtlGetAppContainerNamedObjectPath [0x77337A9E+286]
        RtlGetAppContainerNamedObjectPath [0x77337A6E+238]

Please, anyone help me getting this issue resolved, or any another specific piece of the code to write, which I am missing to fetch the text content from Product description. It would be a big help.

Thanks 🙏🏻.

Answer

try

from msilib.schema import Error
from tkinter import ON
from turtle import goto
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import numpy as np
from random import randint
import pandas as pd
import requests
import csv

browser = webdriver.Chrome(
    r'C:Userspaart.wdmdriverschromedriverwin3297.0.4692.71chromedriver.exe')

browser.maximize_window()  # For maximizing window
browser.implicitly_wait(20)  # gives an implicit wait for 20 seconds

browser.get(
    "https://www.nykaa.com/nykaa-skinshield-matte-foundation/p/460512?productId=460512&pps=1&skuId=460502")

browser.execute_script("document.body.style.zoom='50%'")
time.sleep(1)
browser.execute_script("document.body.style.zoom='100%'")



# Creates "load more" button object.
browser.implicitly_wait(20)
loadMore = browser.find_element_by_xpath(xpath='//div [@class="css-mqbsar"]')
loadMore.click()

browser.implicitly_wait(20)
desc_data = browser.find_elements_by_xpath('//div[@id="content-details"]/p')

# desc_data = browser.find_elements_by_class_name('content-details')
# here in your previous code this class('content-details') which is a single element so it is not iterable
# I used xpath to locate every every element <p> under the (id="content-details) attrid=bute

for desc in desc_data:
    para_detail = desc.text
    print(para_detail)

# if you you want to specify try this
#  para_detail = desc_data[0].text
#  expiry_ date = desc_data[1].text


and don’t just copy the XPath from the chrome dev tools it’s not reliable for dynamic content.

We are here to answer your question about How can I, scrape data from a Javascript Content of a website? - If you find the proper solution, please don't forgot to share this with your team members.

Related Posts

Tutorial Guruji