New column added to Dataframe keeps dissapearing

I needed to clean my data, where I drop defective rows. For this I created a Day_number column to save the day numbers to use in the filtering.

df = pd.read_csv(file_loc)
cols = list(df.columns.values)
cols
df = df[[cols[-1]]+ [cols[9]]]
df.rename(columns={'mean-ghi': 'GHI'}, inplace = True)
df_filtered_nan = df.drop(df[df.GHI.isnull()].index)
df_filtered_nan['Day_number'] = df.Datetime.apply(lambda x: x.day)
df_filtered_nan.reset_index(drop = True, inplace = True)

This results in the following dataframe for df_filtered_nan: dataframe for df_filtered_nan

However, when I carry out the filtering steps, for some reason the Day_number column disappears and I get an attribute error.

months_days_killed_count = {}
days_killed_count = 0
for day in df_filtered_nan.Day_number.unique():
    daywise = len(df_filtered_nan[df_filtered_nan.Day_number == day])
    print(daywise)
    if daywise != 1440:
        df_filtered_nan = df.drop(df_filtered_nan[df_filtered_nan.Day_number == day].index)
        days_killed_count += 1
    else:
        continue
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-117-c78d4b39ef13> in <module>
     24 days_killed_count = 0
     25 for day in df_filtered_nan.Day_number.unique():
---> 26     daywise = len(df_filtered_nan[df_filtered_nan.Day_number == day])
     27     print(daywise)
     28     if daywise != 1440:

D:Anacondalibsite-packagespandascoregeneric.py in __getattr__(self, name)
   5137             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5138                 return self[name]
-> 5139             return object.__getattribute__(self, name)
   5140 
   5141     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'Day_number'

and now when I print the df_filtered_nan dataframe i get : df_filtered_nan dataframe printed after the filtering

So, I can’t for the life of me figure out what I have done wrong. Please help 🙂

Answer

In this line

df_filtered_nan = df.drop(df_filtered_nan[df_filtered_nan.Day_number == day].index)

you are dropping rows from df and not from df_filtered_nan and assigning it back to df_filtered_nan! Therefore whenever that if hits and this line is executed, df_filtered_nan becomes a cropped version of df which doesn’t have the Day_number column; hence the error on the next turn of the for loop.

You probably need

df_filtered_nan = df_filtered_nan.drop(df_filtered_nan[df_filtered_nan.Day_number == day].index)

i.e. drop from the df_filtered_nan. (also this would be more readable IMO if you do:

row_indexes_to_drop = df_filtered_nan[df_filtered_nan.Day_number == day].index
df_filtered_nan = df_filtered_nan.drop(row_indexes_to_drop)

).