Why are 2 different lengths are showing when using value_counts() and shape[0]?

I am trying to find out how many records are there and I thought that there were 2 ways to show the total number of records. However, they show different lengths, why is this happening?

I listed both ways below, to elaborate further one line has the .shape[0] attribute while the other has the .value_counts() attribute

df.loc[(df['rental_store_city'] == 'Woodridge') & (df['film_rental_duration'] > 5)].shape[0]

output: 3186

df.loc[(df['rental_store_city'] == 'Woodridge') & (df['film_rental_duration'] > 5)].value_counts()

output image that shows length of 3153

Answer

It’s because value_counts groups by the duplicates and counts the number of them, and it removes the extra duplicates, so that would make the dataframe shorter.

As you can see in the documentation:

Return a Series containing counts of unique rows in the DataFrame.

Example:

>>> df = pd.DataFrame({'a': [1, 2, 1, 3]})
>>> df
   a
0  1
1  2
2  1
3  3
>>> df.value_counts()
a
1    2
3    1
2    1
dtype: int64
>>> 

As you can see the duplicates made the code dataframe shorter.

If you want to get the length of the dataframe don’t use value_counts, use len:

len(df.loc[(df['rental_store_city'] == 'Woodridge') & (df['film_rental_duration'] > 5)])