Object representation in Pandas.DataFrame

Assume I have the following class, ‘MyClass’.

class MyClass:
    def __repr__(self):
        return 'Myclass()'

    def __str__(self):
        return 'Meh'

instances = [MyClass() for i in range(5)]

Some instances are created and stored in the instances variable. Now, we check its content.

>>> instances
[Myclass(), Myclass(), Myclass(), Myclass(), Myclass()]

To represent the object python calls the __repr__ method. However, when the same instances variable is passed to a pandas.DataFrame, the representation of the object changes and the __str__ method seemed to be called.

import pandas as pd

df = pd.DataFrame(data=instances)
>>> df
     0
0  Meh
1  Meh
2  Meh
3  Meh
4  Meh

Why has the object’s representation changed? Can I determine which representation is used in the DataFrame?

Answer

The data is indeed stored as object. It seems pandas just calls the __str__ method (implicitly) when it displays the dataframe.

You can verify that by calling:

df[0].map(type)

It calls type for each element in the column and returns:

Out[572]: 
0    <class '__main__.MyClass'>
1    <class '__main__.MyClass'>
2    <class '__main__.MyClass'>
3    <class '__main__.MyClass'>
4    <class '__main__.MyClass'>
Name: 0, dtype: object

# likewise you get the the
# representation string of the objects
# with:
df[0].map(repr)
Out[578]: 
0    Myclass()
1    Myclass()
2    Myclass()
3    Myclass()
4    Myclass()
Name: my_instances, dtype: object

Btw, if you want to create a dataframe with a column that contains the data explicitly, rather use:

df = pd.DataFrame({'my_instances': instances})

This way, you assign a column name.