Python – Get non NaN value of each variable in a Datafame Based on Latest Iteration

how to choose each variable based on its latest iteration, grouped by id, non NaN value. My dataframe looks like this

unique_id iteration variable_1 variable_2
111 1 apple NaN
111 2 NaN table
111 3 orange NaN
111 4 pear NaN
111 5 NaN chair

The expected outcome should be

unique_id iteration variable_1 variable_2
111 5 pear chair

Also suppose there will be many unique_id

Answer

You can use:

df.fillna(method="ffill")

see documentation

Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward.


after that you can reverse a sorting and drop duplicates, essentially getting only 1 entry per “group”:

df = df.sort_values(["unique_id", "iteration"], ascending=False)
df = df.drop_duplicates(["unique_id", "iteration"])