BeautifulSoup won’t display description text

So, I was starting a project where I would scrape ‘https://www.gumtree.com/cars/uk’ and extract all used cars prices and experiment with machine learning algorithms on those data. However, when I use requests api alongside beautiful soup to extract the html files, I realised that it won’t display the description text of the website.

Here’s an example: enter image description here

Here’s the beautiful soup result: enter image description here As you can see instead of getting the description of the car, I got something like ‘amp;lhblk;▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄&’.

Am I doing anything wrong?

Here’s my code till now:

from bs4 import BeautifulSoup as bs
import requests
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import json

cars = requests.get('https://www.gumtree.com/cars/uk','lxml')
soup = bs(cars.content)

match = soup.find('div',class_ = 'srp-results') #div with an id of class srp_container-main. We are getting the div
#with an information of that class
print(match)

Answer

A few things, the site you mention is dynamic, there there is part of the content that will be modified by script I guess (I didn’t open it). Other times you may be blocked. Here is an example code. By the content I saw, I would select ‘p.listing-description’ instead, also I excluded from the text strings that have number of distinct characters <=10.

descrs = []
for p in soup.find_all('p', class_='listing-description'):
    if len(set(p.text)) > 10:
        descrs.append(p.text)
        print('-' * 80)
        print(p.text)

This shows you what it finds and gives a list with the texts of each node.