Extract a value from html using BeautifulSoup

I’m trying to retrieve a value from this HTML using bs4. I’m really new to data scraping and I have tried to figure out some ways to get this value but to no avail. The closest solution I saw is this one. Extracting a value from html table using BeautifulSoup

Here is the HTML of which I am looking at:

<div class="dataItem_hld clearfix">
<div class="smalltxt">ROE</div>
<div name="tixStockRoe" class="value">121.362</div>
</div>

I’ve tried this so far:

from bs4 import BeautifulSoup as BS
import requests
url = "https://www.bursamarketplace.com/mkt/themarket/stock/SUPM"
html_content = requests.get(url).text
soup = BS(html_content, 'lxml')

val = soup.find_all('div', {'name': "tixStockRoe", 'class':"value"})

Before I want to try to use strip() to get the value, my val variable is empty.

In [96]: val
Out[96]: []

I’ve been searching the posts for few hours, but I did not manage to type the correct code to get the value yet.

Also, please let me know if there are any good sources to learn about extracting data. Thanks

Update I have edited the code thanks to the response to the post. Now I encounter a problem. It seems like the number 121.362 did not appear in the variable. Any idea here?

val = soup.find_all(attrs={'name': "tixStockRoe"})

and the output is this:

Out[14]: [<div class="value" name="tixStockRoe"><div class="loader loaderSmall"><div class="loader_hld"><img alt="" src="/img/loading.gif"/></div></div></div>]

Answer

The data in that page is loaded by JavaScript and that is the reason you aren’t finding the data you are looking for – 121.362 using beautifulsoup.

beautifulsoup only works on static websites.

You need to use selenium to load the page and get data. You can read more about web-scraping using selenium here

Here is how you scrape using Selenium.

import time
from bs4 import BeautifulSoup, Tag, NavigableString
import requests
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--headless") 

driver = webdriver.Chrome("chromedriver.exe", options=options)

url = 'https://www.bursamarketplace.com/mkt/themarket/stock/SUPM'
driver.get(url)
time.sleep(5)
soup = BeautifulSoup(driver.page_source, 'lxml')

d = soup.find('div', attrs= {'name': 'tixStockRoe'})
print(d.text.strip())
121.362