Concatenate List of Dataframes and Include Original Dataframe Names as Keys

I have code that creates a list of dataframes with the same structure. My wish is to append all of these dataframes together but add a column to the new dataframe that identifies which dataframe the row originally came from.

I easily appended the list of dataframes with:

import pandas as pd
df_rosters = pd.concat(list_of_rosters)

However, I haven’t been able to figure how to add a column with the original dataframe name or index. I’ve found a bunch of examples suggesting to use the keys argument; but each example has hardcoded keys. The size of my list is constantly changing so I need to figure out how to dynamically add in the keys.

Thanks in advance!

Answer

Let’s assign an indicator column to each DataFrame in the list. (Names can be zipped together with the list of DataFrames or created by something like enumerate):

With enumerate

pd.concat(d.assign(df_name=f'{i:02d}') for i, d in enumerate(list_of_rosters))

   0  1 df_name
0  4  7      00
1  7  1      00
2  9  5      00
0  8  1      01
1  1  8      01
2  2  6      01

Or with zip:

pd.concat(d.assign(df_name=name)
          for name, d in zip(['name1', 'name2'], list_of_rosters))

   0  1 df_name
0  4  7   name1
1  7  1   name1
2  9  5   name1
0  8  1   name2
1  1  8   name2
2  2  6   name2

Setup:

import numpy as np
import pandas as pd

np.random.seed(5)
list_of_rosters = [
    pd.DataFrame(np.random.randint(1, 10, (3, 2))),
    pd.DataFrame(np.random.randint(1, 10, (3, 2)))
]

list_of_rosters:

[   0  1
0  4  7
1  7  1
2  9  5,    
    0  1
0  8  1
1  1  8
2  2  6]