pandas groupby data frame column and create new column if particular value exist within the group

Initial dataframe

df = pd.DataFrame({'value_text': ['type1', 'type1', 'type1','type2','type2','type3','type3','type4','type4','type5','type6'],
                   'year': [2016,2017,2018,2018,2019,2019,2020,2019,2021,2020,2021]})

   value_text  year
0       type1  2016
1       type1  2017
2       type1  2018
3       type2  2018
4       type2  2019
5       type3  2019
6       type3  2020
7       type4  2019
8       type4  2021
9       type5  2020
10      type6  2021

Need to Group by “value_text” and create new column “value” if within the group value_text contains current year need to flag active else inactive.

Ex: When group by type1 within the group year column didn’t contains current year so all the rows flagged as inactive

similarly type4 contains current year within the group so flagged as active

Output:

   value_text  year     value
0       type1  2016  Inactive
1       type1  2017  Inactive
2       type1  2018  Inactive
3       type2  2018  Inactive
4       type2  2019  Inactive
5       type3  2019  Inactive
6       type3  2020  Inactive
7       type4  2019    Active
8       type4  2021    Active
9       type5  2020  Inactive
10      type6  2021    Active

Answer

Probably not the most pythonic way to achieve this, but it works:

import pandas as pd
import datetime as dt

df = pd.DataFrame({'value_text': ['type1', 'type1', 'type1','type2','type2','type3','type3','type4','type4','type5','type6'],
                   'year': [2016,2017,2018,2018,2019,2019,2020,2019,2021,2020,2021]})

value_text  year
0   type1   2016
1   type1   2017
2   type1   2018
3   type2   2018
4   type2   2019
5   type3   2019
6   type3   2020
7   type4   2019
8   type4   2021
9   type5   2020
10  type6   2021

Group the years for each value:

map_df = df.groupby('value_text').agg({'year':list}).reset_index()

    value_text  year
0   type1   [2016, 2017, 2018]
1   type2   [2018, 2019]
2   type3   [2019, 2020]
3   type4   [2019, 2021]
4   type5   [2020]
5   type6   [2021]

Create a dictionary for mapping if the current year is in the list of years for each value:

map_dict = dict(zip(map_df['value_text'], map_df['year'].apply(lambda x: 'Active' if dt.date.today().year in x else 'Inactive')))

{'type1': 'Inactive',
 'type2': 'Inactive',
 'type3': 'Inactive',
 'type4': 'Active',
 'type5': 'Inactive',
 'type6': 'Active'}

Apply the map to create a new column ‘value’:

df['value'] = df['value_text'].map(map_dict)

    value_text  year    value
0   type1   2016    Inactive
1   type1   2017    Inactive
2   type1   2018    Inactive
3   type2   2018    Inactive
4   type2   2019    Inactive
5   type3   2019    Inactive
6   type3   2020    Inactive
7   type4   2019    Active
8   type4   2021    Active
9   type5   2020    Inactive
10  type6   2021    Active