Using list with keys for best fit line in Cufflinks in Python

I am a beginner trying to use cufflinks to produce a scatter chart. The optional argument to include best fit line is bestfit=True. The code to produce this chart looks like this:

enter image description here

import pandas as pd 
from plotly.offline import iplot, init_notebook_mode
import cufflinks
cufflinks.go_offline(connected=True)
init_notebook_mode(connected=True)
    
df = pd.read_csv('https://raw.githubusercontent.com/inferentialthinking/inferentialthinking.github.io/master/data/nba2013.csv')
    
df.iplot(
        
        z='Weight'
        , x='Age in 2013'
        , y='Weight'
        , kind='scatter'
        , mode='markers'
        , xTitle='Age'
        , yTitle="Weight"
        , title="NBA players' weight and age"
        , text='Name'
        , theme='solar'
        , bestfit=True
        #, categories='Position'
        
            )

However, when I add the argument categories='Position' (in this case removing the “#”) to create a colour categorisation (which splits the players into guards, centers and forwards), the best fit line disappears. See chart of this here. I am not getting any error message, there’s just no best fit line(s) anymore.

The cufflinks help for the bestfit argument states:

bestfit : boolean or list
            If True then a best fit line will be generated for 
            all columns. 
            If list then a best fit line will be generated for 
            each key on the list.

I want to get a best fit line for each of the three categories (i.e. three best fit lines). I don’t understand how to use a list to generate a best fit line ‘for each key on the list’. If possible at all in this case, it would be great if someone could explain how to do it?

Any help is much appreciated!

Answer

I really like cufflinks, but what you’re aiming to do here is easier using plotly express:

fig = px.scatter(df, 
                 x = 'Age in 2013',
                 y = 'Height',
                 size = 'Weight',
                 template = 'plotly_dark',
                 color_discrete_sequence = colors[1:],
                 color = 'Position',
                 trendline = 'ols',
                 title = 'NBA Players weight and age')

This approach resembles that of cufflinks in many ways. The only real ecception is that px.scatter uses size where cufflinks uses z. And, of course, that px.scatter produces trendlines for each subcategory of Position with the color argument.

enter image description here

# imports
import pandas as pd
import plotly.express as px
import plotly.io as pio

# data
#df = px.data.stocks()
df = pd.read_csv('https://raw.githubusercontent.com/inferentialthinking/inferentialthinking.github.io/master/data/nba2013.csv')

colors = px.colors.qualitative.T10

# plotly
fig = px.scatter(df, 
                 x = 'Age in 2013',
                 y = 'Height',
                 size = 'Weight',
                 template = 'plotly_dark',
                 color_discrete_sequence = colors[1:],
                 color = 'Position',
                 trendline = 'ols',
                 title = 'NBA Players weight and age')
fig.show()

Leave a Reply

Your email address will not be published. Required fields are marked *