Creating a scatterplot for a grouped pandas dataframe

I have a Pandas DataFrame where I want to group by a certain column. Afterwards, I want to make a scatterplot of this grouped dataframe. However if I do so I get an error, because the column I group by is nog recognized.

# Data loading, processing and for more
import pandas as pd
import numpy as np

# Visualization
import seaborn as sns
import matplotlib.pyplot as plt
# set seaborn style because it prettier
sns.set()

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

df2 = df.groupby(['A']).agg({'D':sum})
df2.plot.scatter(x='A', y='D')

How would I create such a scatterplot?

Answer

You can either not set A as index in groupby:

# notice the difference `sum` and `'sum'`
# the later is vectorized
df2 = df.groupby(['A'], as_index=False).agg({'D':'sum'})

df2.plot.scatter(x='A', y='D')

Or you can keep your code and use plt.scatter:

df2 = df.groupby(['A']).agg({'D':'sum'})
plt.scatter(df2.index, df2['D'])

Leave a Reply

Your email address will not be published. Required fields are marked *