For Loop in Python Error: The truth value of a Series is ambiguous

why this for loop does not work…?

I want to get a new column with Delivery Year, it consists of these columns, however, there are a lot of Nans so the logic is that the for loop goes through columns and returns the first non-Na value. The best-case scenario is Delivery Date, when this is not there then Build Year if even this is not there then at least In-Service Date when the machine was set into work.

df = pd.DataFrame({'Platform ID' : [1,2,3,4], "Delivery Date" : [str(2009), float("nan"), float("nan"), float("nan")],
                                              "Build Year" : [float("nan"),str(2009),float("nan"), float("nan")], 
                                              "In Service Date" : [float("nan"),str("14-11-2010"), str("14-11-2009"), float("nan")]})
df.dtypes
df

def delivery_year(delivery_year, build_year, service_year):
    out = []
    for i in range(0,len(delivery_year)):
        if delivery_year.notna():
            out[i].append(delivery_year)
        if (delivery_year[i].isna() and build_year[i].notna()):
            out[i].append(build_year)
        elif build_year[i].isna():
            out[i].append(service_year.str.strip().str[-4:])
        else:
            out[i].append(float("nan"))
    return out

df["Delivery Year"] = delivery_year(df["Delivery Date"], df["Build Year"], df["In Service Date"])

When I run this function I get this error and I do not know why…

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The expected output (column Delivery Year): enter image description here

Answer

Update 3

I rewrote your function in the same manner of your, so without change the logic and the type of your columns. I let you compare the two versions:

def delivery_year(delivery_date, build_year, service_year):
    out = []
    for i in range(len(delivery_date)):
        if pd.notna(delivery_date[i]):
            out.append(delivery_date[i])
        elif pd.isna(delivery_date[i]) and pd.notna(build_year[i]):
            out.append(build_year[i])
        elif pd.isna(build_year[i]) and pd.notna(service_year[i]):
            out.append(service_year[i].strip()[-4:])
        else:
            out.append(float("nan"))
    return out

df["Delivery Year"] = delivery_year(df["Delivery Date"],
                                    df["Build Year"],
                                    df["In Service Date"])

Notes:

  1. I changed the name of your first parameter because delivery_year is also the name of your function, so it can be confusing.

  2. I also replaced the .isna() and .notna() methods by their equivalent functions: pd.isna(...) and pd.notna(...).

  3. The second if became elif

Update 2

Use combine_first to replace your function. combine_first updates first series (‘Delivery Date’) with the second series where values are NaN. You can chain them to fill your ‘Delivery Year’.

df['Delivery Year'] = df['Delivery Date'] 
                          .combine_first(df['Build Year']) 
                          .combine_first(df['In Service Date'].str[-4:])

Output:

>>> df
   Platform ID Delivery Date Build Year In Service Date Delivery Year
0            1          2009        NaN             NaN          2009
1            2           NaN       2009      14-11-2010          2009
2            3           NaN        NaN      14-11-2009          2009
3            4           NaN        NaN             NaN           NaN

Update

You forgot the [i]:

if delivery_year[i].notna():

The truth value of a Series is ambiguous:

>>> delivery_year.notna()
0     True  # <- 2009
1    False  # <- NaN
2    False
3    False
Name: Delivery Date, dtype: bool

Pandas should consider the series is True (2009) or False (NaN)?

You have to aggregate the result with .any() or .all()

>>> delivery_year.notna().any()
True  # because there is at least one non nan-value.

>>> delivery_year.notna().all()
False  # because all values are not nan.