I have a pandas dataframe and want to drop all rows with a start date smaller than 2019 and greater than 2020. For sure I can just iterate over it, do the condition, and drop it by index if it is False. For example like the following:
for index, row in df.iterrows(): # extract year from date format YYYY-MM-DD year = int(row['START_DATE'][:4]) # remove all dates before and after 2019/2020 if not (year >= 2019 and year <= 2020): df = df.drop(index)
But my goal is to write code more effectively. And that is the point where I am stuck. I came to the following line:
df = df.drop(df[(int(df.START_DATE[:4]) < 2019) & (int(df.START_DATE[:4]) > 2020)].index)
but I get a TypeError: cannot convert the series to <class ‘int’> and don’t know how to convert the values to an int in this short statement.
First ensure that
START_DATE column is in
pd.datetime. Then filter them by your condition.
~ is a
NOT operation in
df["START_DATE"] = pd.to_datetime(df["START_DATE"]) df = df[~((df["START_DATE"].dt.year < 2019) | (df["START_DATE"].dt.year > 2020))]