format excel data into database format with pandas – data cleaning

want to know if this type of data cleaning has been done in pandas. I have the following data frame (the columns continues with more dates but this is a sample)

d = {'Date' : ["Spiderman total",
"Division A",
"Division B",
"Division C",
"Superman total",
"Division A",
"Division B",
"Division C"
],
'1/4/20': [3,1,2,0,10,5,3,2],
'1/11/20':[5,2,1,2,8,4,4,0],
'1/18/20': [15,7,7,1,20,10,0,10]

}
df = pd.DataFrame(data=d)

And I want to transform that dataframe into this format (this is an example for 1/4/20 but ultimately would like for all the dates from the initial dataframe)

 b = {'date':['1/4/20','1/4/20','1/4/20','1/4/20','1/4/20','1/4/20'],'Business':['Spiderman','Spiderman','Spiderman','Superman','Superman','Superman'],
'Division':['A','B','C','A','B','C'],
'totals':[1,2,0,5,3,2]}
dt = pd.DataFrame(data=b) 

No need for Business totals as it can be obtained as the sum of the divisions but I have to put the name (in this case for example Spiderman) in a Business Column.

I would like to know if this is feasible in pandas or there is other methods I should try.

Answer

With a little post and pre processing you could use melt from Pandas to transform the data as you want.

import pandas as pd

d = {
    "Date": [
        "Spiderman total",
        "Division A",
        "Division B",
        "Division C",
        "Superman total",
        "Division A",
        "Division B",
        "Division C",
    ],
    "1/4/20": [3, 1, 2, 0, 10, 5, 3, 2],
    "1/11/20": [5, 2, 1, 2, 8, 4, 4, 0],
    "1/18/20": [15, 7, 7, 1, 20, 10, 0, 10],
}
df = pd.DataFrame(data=d)

df["Business"] = df["Date"][df["Date"].str.endswith("total")]

df = df.fillna(method="ffill")

df = df[~df["Date"].str.endswith("total")]

df = df.melt(id_vars=["Business", "Date"])

df.columns = ["Business", "Division", "Date", "Totals"]

df = df[["Date", "Business", "Division", "Totals"]]

df["Business"] = df["Business"].str.split(" ").str[0]
df["Division"] = df["Division"].str.split(" ").str[1]

print(df)
Date Business Division Totals
0 1/4/20 Spiderman A 1
1 1/4/20 Spiderman B 2
2 1/4/20 Spiderman C 0
3 1/4/20 Superman A 5
4 1/4/20 Superman B 3
5 1/4/20 Superman C 2
6 1/11/20 Spiderman A 2
7 1/11/20 Spiderman B 1
8 1/11/20 Spiderman C 2
9 1/11/20 Superman A 4
10 1/11/20 Superman B 4
11 1/11/20 Superman C 0
12 1/18/20 Spiderman A 7
13 1/18/20 Spiderman B 7
14 1/18/20 Spiderman C 1
15 1/18/20 Superman A 10
16 1/18/20 Superman B 0
17 1/18/20 Superman C 10

Note, if this data is actually in Excel you could do this using Power Query (Data>Get & Transform).