BeautifulSoup Web Scraping – Can’t access and extract element

this is my first question at stack overflow. I am working on a web scraping project and I try to access html elements with beautiful soup.

Please can someone give me advice how to extract the following elements?

The task is to scrape all job listings from a search result page.

The job listing elements are inside the “ResultsSectionContainer”.

I want to access each “article class” and

  1. extract its id e.g job-item-7460756
  2. extract its href where data-at=”job-item-title”
  3. extract its h2 text (solved)

How to loop through the ResultsSectionContainer and access/extract the information for each ‘article class’ element / id job-item ?

The name of the article class is somehow dynamic/unique and changes (I guess) every time a new search is done.

<div class="ResultsSectionContainer-gdhf14-0 cxyAav">n 
 <article class="sc-fzowVh cUgVEH" id="job-item-7460756">
  <a class="sc-fzoiQi eRNcm" data-at="job-item-title" 
    href="/stellenangebote--Wirtschaftsinformatiker-m-w-d-mit-Schwerpunkt-ERP-Systeme-Heidelberg-Celonic-Deutschland-GmbH-Co-KG--7460756-inline.html" target="_blank">n
    <h2 class="sc-fzqARJ iyolKq">n  Wirtschaftsinformatiker (m/w/d) mit Schwerpunkt ERP-Systemen                
        </h2>n               
        </a>n              
  
  <article class="sc-fzowVh cUgVEH" id="job-item-7465958">n
   ...

Answer

You can do like this.

  • Select the <div> with class name as ResultsSectionContainer-gdhf14-0
  • Find all the <article> tags inside the above <div> using .find_all()This will give you a list of all article tags
  • Iterate over the above list and extract the data you need.
from bs4 import BeautifulSoup

s = '''<div class="ResultsSectionContainer-gdhf14-0 cxyAav">
 <article class="sc-fzowVh cUgVEH" id="job-item-7460756">
  <a class="sc-fzoiQi eRNcm" data-at="job-item-title" 
    href="/stellenangebote--Wirtschaftsinformatiker-m-w-d-mit-Schwerpunkt-ERP-Systeme-Heidelberg-Celonic-Deutschland-GmbH-Co-KG--7460756-inline.html" target="_blank">
    <h2 class="sc-fzqARJ iyolKq">  Wirtschaftsinformatiker (m/w/d) mit Schwerpunkt ERP-Systeme
        </h2>
        </a>
  
  </div>'''

soup = BeautifulSoup(s, 'lxml')
d = soup.find('div', class_='ResultsSectionContainer-gdhf14-0')
for i in d.find_all('article'):
    job_id = i['id']
    job_link = i.find('a', {'data-at': 'job-item-title'})['href']
    print(f'JOB_ID: {job_id}nJOB_LINK: {job_link}')
JOB_ID: job-item-7460756
JOB_LINK: /stellenangebote--Wirtschaftsinformatiker-m-w-d-mit-Schwerpunkt-ERP-Systeme-Heidelberg-Celonic-Deutschland-GmbH-Co-KG--7460756-inline.html