How to extract href from list of lists

I am trying to extract the href from the lists contained within a list. This is the code I have used to create the list:

import requests as re
from bs4 import BeautifulSoup as bs
import lxml

category_page = re.get("https://sv.epaenlinea.com/automotriz.html")
category_soup = bs(category_page.text, 'lxml')

url = [i.find_all('a') for i in category_soup.find_all('ol',{'class':'items'})[-1].find_all('li',{'class':'item'})]
print(url) 

This is the output generated:

[[<a href="https://sv.epaenlinea.com/automotriz/accesorios-exterior.html">Accesorios exterior</a>], [<a href="https://sv.epaenlinea.com/automotriz/seguridad-automotriz.html">Seguridad automotriz</a>], [<a href="https://sv.epaenlinea.com/automotriz/accesorios-interior.html">Accesorios interior</a>], [<a href="https://sv.epaenlinea.com/automotriz/limpieza-y-cuidado.html">Limpieza y cuidado</a>], [<a href="https://sv.epaenlinea.com/automotriz/lubricantes-y-aditivos.html">Lubricantes y aditivos</a>], [<a href="https://sv.epaenlinea.com/automotriz/llantas.html">Llantas</a>], [<a href="https://sv.epaenlinea.com/automotriz/baterias-y-accesorios.html">Baterías y accesorios</a>]]

This is the output I would like:

["https://sv.epaenlinea.com/automotriz/accesorios-exterior.html",
 "https://sv.epaenlinea.com/automotriz/seguridad-automotriz.html",
                                  .
                                  .
                                  .
 "https://sv.epaenlinea.com/automotriz/baterias-y-accesorios.html"]

Any ideas on how to do it?

Answer

You can use bs4 property attrs that extracts attributes as a dict. This should work

import requests as re
from bs4 import BeautifulSoup as bs
import lxml

category_page = re.get("https://sv.epaenlinea.com/automotriz.html")
category_soup = bs(category_page.text, 'lxml')

url = [i.find_all('a')[0].attrs['href'] for i in category_soup.find_all('ol',{'class':'items'})[-1].find_all('li',{'class':'item'})]
print(url) 

Leave a Reply

Your email address will not be published. Required fields are marked *