complex xml to csv using python [closed]

<app>
<doc>
<field name="id">013</field>
<field name="groupid">013</field>
<field name="img_url">8b4</field>
<field name="filetype">HTML</field>
<field name="url">https://calgaryherald.com/pmn/business-pmn/sally-rumbles-toward-u-s-/</field>
<field name="topic">accurate</field>
<field name="topic">additional</field>
<field name="topic">agriculture</field>
<field name="topic">area</field>
<field name="topic">biggest</field>
</doc>
<doc>
<field name="id">0131</field>
<field name="groupid">013</field>
<field name="img_url">8b</field>
<field name="filetype">HTML</field>
<field name="url">https://calgaryherald.com/pmn/business-pmn/sally-rumbles-toward</field>
<field name="topic">accurate</field>
<field name="topic">additional</field>
<field name="topic">agriculture</field>
<field name="topic">area</field>
<field name="topic">biggest1</field>
<field name="topic">biggest2</field>
<field name="topic">biggest3</field>
</doc>
</app>

I have a xml similar to this i need to convert it to a csv in python. Does anyone know how to do it and also the field name topic differs for different doc and the csv headers should be similar to the field name and for topics it should be in a single cell with comma separated.

Expected Output enter image description here

Answer

You could use an XML parser that emits element data as it parses to build the csv. On every end tag, you could either add a value to the row or write the row itself. One advantage of iterparse is that you don’t need to load the entire document into memory before processing.

import xml.etree.ElementTree as ET
import io
import csv

field_names = ["id", "groupid", "img_url", "filetype", "url", "topic"]
field_names_set = set(field_names)

with open("test.csv", "w", newline="") as out_file:
    writer = csv.DictWriter(out_file, field_names)
    writer.writeheader()
    row = {}
    topic = []
    for event, elem in ET.iterparse("test.xml"): # iterate tag end events
        if elem.tag == "doc":
            # doc elem end, write row to csv and setup for next
            row["topic"] = ",".join(topic)
            writer.writerow(row)
            row = {}
            topic = []            
        elif elem.tag == "field":
            # field elem end, add to current row
            if elem.attrib["name"] == "topic":
                topic.append(elem.text)
            else:
                row[elem.attrib["name"]] = elem.text