I have a df with time index, and a few columns with numerical values, but also containing missing values on certain occasions. For eg:

timeindex ColA ColB ColC 00:02:00 454 436 4334 00:04:00 653 00:06:00 3423 4354 00:08:00 3432 00:10:00 2343 00:12:00 32432 23423

I would like to create a subset of the dataframe such that for every consecutive group of 3 rows, it picks the row that has the lowest number of missing values. So for the above df, the subsetdf would look like:

timeindex ColA ColB ColC 00:02:00 454 436 4334 00:12:00 32432 23423

Can you advise how i can achieve this please

## Answer

Use `df.filter`

to select the columns, check for empty strings, `sum`

on axis 1 and then finally `groupby.idxmax`

idx = (df.assign(count=df.filter(like="Col").notnull().sum(1)) .groupby(np.arange(len(df))//3)["count"].idxmax()) print (df.loc[idx]) timeindex ColA ColB ColC 0 00:02:00 454 436 4334 5 00:12:00 32432 23423