Pandas groupby where one of the group values is in a range

I want to find the sizes of the groups that have at least one row with 0.5 < C < 1.0. Given a dataframe like this:

A B C
1 2 0.1
1 2 0.9
1 2 1.0
2 5 0
2 5 0.1
2 5 0.2
3 4 0.6

I’d like to see something like the following returned:

A B Size
1 2 3
3 4 1

I’ve tried the following:

group = dataset.groupby(['A', 'B'])
filtered = group.filter(lambda x: 0.5 < x['C'] < 1.0)
filtered.size()

However, I get this error on the second line:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The any() function makes sense in this context as I want any value for C to be between 0.5 and 1.0 in order to count that group, but I don’t know where to put the any() call. I tried calling it on the lambda. I tried after filter(). Nothing I try works…

Answer

use any in the boolean indexing of the groupby:

df.groupby(['A','B']).size()[df.groupby(['A','B']).apply(lambda g:((g['C'] > 0.5) & (g['C'] < 1.0)).any())]

prints

A  B
1  2    3
3  4    1
dtype: int64

Leave a Reply

Your email address will not be published. Required fields are marked *