I want to find the sizes of the groups that have at least one row with `0.5 < C < 1.0`

. Given a dataframe like this:

A | B | C |
---|---|---|

1 | 2 | 0.1 |

1 | 2 | 0.9 |

1 | 2 | 1.0 |

2 | 5 | 0 |

2 | 5 | 0.1 |

2 | 5 | 0.2 |

3 | 4 | 0.6 |

I’d like to see something like the following returned:

A | B | Size |
---|---|---|

1 | 2 | 3 |

3 | 4 | 1 |

I’ve tried the following:

group = dataset.groupby(['A', 'B']) filtered = group.filter(lambda x: 0.5 < x['C'] < 1.0) filtered.size()

However, I get this error on the second line:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The `any()`

function makes sense in this context as I want *any* value for `C`

to be between 0.5 and 1.0 in order to count that group, but I don’t know where to put the `any()`

call. I tried calling it on the lambda. I tried after `filter()`

. Nothing I try works…

## Answer

use `any`

in the boolean indexing of the `groupby`

:

df.groupby(['A','B']).size()[df.groupby(['A','B']).apply(lambda g:((g['C'] > 0.5) & (g['C'] < 1.0)).any())]

prints

A B 1 2 3 3 4 1 dtype: int64