Pandas column disappears after using bfill

I have a dataframe using a set of columns from a much larger dataframe

I have used the bfill function to fill up missing date values in certain columns.However in a classic scenario, one of these columns come with only null values and after bfill, that column disappears

import pandas as pd
import warnings
import shutil
import xlrd
import xlwt
import glob
from datetime import datetime
from datetime import timedelta
import os
from pandas import ExcelWriter

for f in glob.glob("Raw/*.xlsx"):
    xls = pd.ExcelFile(f)
    df1 = xls.parse(sheet_name=0)
    #print(df1.shape)
    new = df1.filter(['Actual Start',
                      'FOR CLIENT SUBMISSION / APPROVAL-Issued for Self Discipline Check (SDC)(Actual Finish Date)',
                      'FOR CLIENT SUBMISSION / APPROVAL-Issued for Inter-disciplinary Check  (IDC)(Actual Finish Date)',
                      'FOR CLIENT SUBMISSION / APPROVAL-Submission to client(Actual Finish Date)',
                      'FOR CLIENT SUBMISSION / APPROVAL-Reviewed by Client(Actual Finish Date)',
                      'FOR CLIENT SUBMISSION / APPROVAL-Approved by Client (FAC)(Actual Finish Date)',
                      'FOR CLIENT SUBMISSION / APPROVAL-Issue for Construction(Actual Finish Date)',
                      'FOR PR with offer Evaluation-Discipline Completed (SDC)(Actual Finish Date)',
                      'FOR PR with offer Evaluation-Issuance to procurement team (FRB)(Actual Finish Date)','FOR PR with offer Evaluation-Tech. Evaluation Completion(Actual Finish Date)'], axis=1)
    #print(new.shape)
    
    cols = new.columns
    new[cols] = new[cols].apply(pd.to_datetime).bfill(axis=1)
    print(cols)

OUTPUT: The column : FOR CLIENT SUBMISSION / APPROVAL-Approved by Client (FAC)(Actual Finish Date) is no longer in the dataframe

Index(['Actual Start',
       'FOR CLIENT SUBMISSION / APPROVAL-Issued for Self Discipline Check (SDC)(Actual Finish Date)',
       'FOR CLIENT SUBMISSION / APPROVAL-Issued for Inter-disciplinary Check  (IDC)(Actual Finish Date)',
       'FOR CLIENT SUBMISSION / APPROVAL-Submission to client(Actual Finish Date)',
       'FOR CLIENT SUBMISSION / APPROVAL-Reviewed by Client(Actual Finish Date)',
       'FOR CLIENT SUBMISSION / APPROVAL-Issue for Construction(Actual Finish Date)',
       'FOR PR with offer Evaluation-Discipline Completed (SDC)(Actual Finish Date)',
       'FOR PR with offer Evaluation-Issuance to procurement team (FRB)(Actual Finish Date)',
       'FOR PR with offer Evaluation-Tech. Evaluation Completion(Actual Finish Date)'],
      dtype='object')

Answer

I was not able to see a problem directly related to bfill. It’s a bit difficult to understand the problem entirely without sample data. But the way you select columns is non-idiomatic. Does the following work for you?

# Columns of interest
cols = ['Actual Start',
        'FOR CLIENT SUBMISSION / APPROVAL-Issued for Self Discipline Check (SDC)(Actual Finish Date)',
        'FOR CLIENT SUBMISSION / APPROVAL-Issued for Inter-disciplinary Check  (IDC)(Actual Finish Date)',
        'FOR CLIENT SUBMISSION / APPROVAL-Submission to client(Actual Finish Date)',
        'FOR CLIENT SUBMISSION / APPROVAL-Reviewed by Client(Actual Finish Date)',
        'FOR CLIENT SUBMISSION / APPROVAL-Approved by Client (FAC)(Actual Finish Date)',
        'FOR CLIENT SUBMISSION / APPROVAL-Issue for Construction(Actual Finish Date)',
        'FOR PR with offer Evaluation-Discipline Completed (SDC)(Actual Finish Date)',
        'FOR PR with offer Evaluation-Issuance to procurement team (FRB)(Actual Finish Date)','FOR PR with offer Evaluation-Tech. Evaluation Completion(Actual Finish Date)']

for f in glob.glob("Raw/*.xlsx"):
    xls = pd.ExcelFile(f)
    df1 = xls.parse(sheet_name=0)
    # Select the columns of interest
    new1 = df1[cols]
    new2 = new1.apply(pd.to_datetime)
    new3 = new2.bfill(axis=0)

Update: bfill means backward-fill, it fills nan-gaps by using the next valid observation along the axis. If you want to fill nan’s along the column axis, you should call df.bfill(axis=0). (I believe that the axis=1 in your code was not what you intended) Note that an empty column (with just nan values) will remain empty after bfill. It is in no way possible that bfill removes a column.