I want to extract all data of security bulletin table from html https://helpx.adobe.com/security/products/dreamweaver/apsb21-13.html. Based on my code, I only can extract the data in the table one by one. The code cannot extract the overall data from the table.
This is my code
soup = BeautifulSoup(html_content, "lxml") print(soup.prettify()) gdp = soup.find_all("table") table = gdp body = table.find_all("tr") head = body body_rows = body[1:] headings =  for item in head.find_all("td"): item = (item.text).rstrip("n") headings.append(item) all_rows =  # will be a list for list for all rows for row_num in range(len(body_rows)): # A row at a time row =  # this will old entries for one row for row_item in body_rows[row_num].find_all("td"): aa = re.sub("(xa0)|(n)|,","",row_item.text) row.append(aa) all_rows.append(row) df = pd.DataFrame(data=all_rows,columns=headings) df.head() df = pd.DataFrame(data=all_rows,columns=headings) df.to_csv('C:/Users//AdobeAir-APSB16-23 Security Update Available for Adobe AIR.csv') df.head()
The output of the code is
Bulletin ID Date Published Priority 0 APSB21-13 February 09 2021 3
For this code, I imported library such as Beautifulsoup, requests, pandas and re. Hope anyone can help me on how to extract the data in the table all at once and can be converted into csv format. Thank you.
You can make
pandas do the heavy-lifting for you with
url = 'https://helpx.adobe.com/security/products/dreamweaver/apsb21-13.html' dfs = pd.read_html(url, header=0) dfs
Product Affected Versions Platform 0 Adobe Dreamweaver 20.2 Windows and macOS 1 Adobe Dreamweaver 21.0 Windows and macOS
P.S. It outputs a list of all tables found in the HTML. For example,
dfs is the first table:
Bulletin ID Date Published Priority 0 APSB21-13 February 09, 2021 3