Python, Pandas: Keep only the newest and unique data inside dataframe

Good evening,

the objects inside my dataframe can pop up as many times they want, always with additional – changing – extra data and at least with a unique timestamp (column with date is not unique), something like this…

id    |    object    |    additional_data    |    date          |    timestamp
1     |    item_a    |    ...                |    2014-04-15    |    10:16:22
2     |    item_a    |    ...                |    2014-04-10    |    18:19:01
3     |    item_a    |    ...                |    2014-04-10    |    17:59:43
4     |    item_b    |    ...                |    2014-04-13    |    10:16:22
5     |    item_c    |    ...                |    2014-04-15    |    00:01:59
6     |    item_c    |    ...                |    2014-04-14    |    08:46:00
7     |    item_d    |    ...                |    2014-04-15    |    10:12:47

Is it possible to filter the dataframe only for the unqique and newest data? For example like this:

id    |    object    |    additional_data    |    date          |    timestamp
1     |    item_a    |    ...                |    2014-04-15    |    10:16:22
4     |    item_b    |    ...                |    2014-04-13    |    10:16:22
5     |    item_c    |    ...                |    2014-04-15    |    00:01:59
7     |    item_d    |    ...                |    2014-04-15    |    10:12:47

Thanks for all your help and have a great day!

Answer

Firstly sort your dataframe by ‘date’ and ‘timestamp’ column by using sort_values():

df=df.sort_values(by=['date','timestamp'],ascending=[False,False]])

Now use drop_duplicates() method:

df=df.drop_duplicates(subset=['object'],ignore_index=True)

OR

you can also do this by sort_values() and groupby():

df.sort_values(by=['date','timestamp'],ascending=[False,False]).groupby('object',as_index=False).first()