Extracting HTML tables and store them in separate file

I wrote a code to extract subparts of tables, but I want to extract every tag from the input, and then store them in a separate html file

from bs4 import BeautifulSoup

soup = BeautifulSoup(myInput)
table = soup.find('table', {'class': '*'})

I expect the code to show me all tables containted on the input text, but it outputs an error code because the * is not defined

EDIT : * means every table in the file, like saying *.txt

Answer

class is the attribute that you are searching for, but you have to tell soup which class you are using to get table

<table class='HiClass'>
A
</table>
<table class='MiClass'>
B
</table>
<table class='*'>
C
</table>

For instance,

table1 = soup.find('table', {'class': '*'})
table2 = soup.find('table', {'class': 'HiClass'})

You’ll get “C” table in table1 and “A” in table2.

To get all tables, just use

table = soup.findAll('table')

and you will get all elements which use <table> tag or tags try returned as a list

Demo:

import requests
from bs4 import BeautifulSoup

def get_request(url):      
    r = requests.get(url)
    soup = BeautifulSoup(r.content,'html5lib') 
    table = soup.findAll('table')
    return table

url ='https://www.w3schools.com/html/html_tables.asp'
print(get_request(url))

Leave a Reply

Your email address will not be published. Required fields are marked *