How to use np.where in creating new column using previous rows?

I’m kind of new in python and I’ve been working on migrating my excel to pandas because it cannot run a hundred of thousands of rows. I have a table that looks like this in excel: Excel table

Where column A and B are inputs and column C is output. The formula for column C is =IF(B2="new",A2,C3)

If ‘Status’ is equal to “new” the result will be the value in column A

And if ‘Status’ is not equal to “new” the result will be the previous row of C

I tried doing it using np.where and .shift(-1) using this code

df['Previous'] = np.where (df['Status']=='new', df['Count'], df['Previous'].shift(-1))

but it seems I am receiving this error Key error: ‘Previous’ It seems that I need to define the column ‘Previous’ first.

I tried searching the stackoverflow but most of the time the related solutions are somewhat based in complex problems and I was not able to pattern it to my simple problem.

Thanks a lot for your help.

This is the df.columns looks like Index([‘Count’, ‘Status’], dtype=’object’)

This is the result of my code once run. Result

Answer

Since you are creating new column Previous and this column is still not yet defined when you use it in the definition of itself, in the np.where() statement, you will get an error.

Also, your question is actually not taking a “previous” value since when you are handling the first row, there is no previous value for the first row and even when processing 2nd and 3rd rows the value is still not defined until we go to the 4th row.

So, the solution need to set a kind of temporary non-deterministic value while processing rows still with unknown value and set these non-deterministic values afterwards when some values are defined. In this case, we can set those temporary non-deterministic values as np.nan and then back-fill using .bfill() with defined values afterwards. We use backward fill because we are filling values of rows with index 0, 1, 2 by value on row with index 3.

To solve it, you can try the following:

df['Previous'] = np.where(df['Status']=='new', df['Count'], np.nan)
df['Previous'] = df['Previous'].bfill().astype(int)


print(df)

    Count Status  Previous
0       4    old         1
1       3    old         1
2       2    old         1
3       1    new         1
4      40    old        10
5      30    old        10
6      20    old        10
7      10    new        10
8     400    old       100
9     300    old       100
10    200    old       100
11    100    new       100

Here, I assumed the dtype of column Count is integer. If it is of string type, then you don’t need to use the .astype(int) in the code above.

Alternatively, you can also do it in one step using .where() on column Count, instead of np.where() as follows:

df['Previous'] = df['Count'].where(df['Status'] =='new').bfill().astype(int)


print(df)

    Count Status  Previous
0       4    old         1
1       3    old         1
2       2    old         1
3       1    new         1
4      40    old        10
5      30    old        10
6      20    old        10
7      10    new        10
8     400    old       100
9     300    old       100
10    200    old       100
11    100    new       100

Similarly, no need to use .astype(int) in the code above if column Count is of string type.

.where() is to: “Replace values where the condition is False”. This is some how equivalent to “Retain values where the condition is True”. So when the condition is True, we use the values of original Count column. Then, you probably would ask: “What if the condition False and what value to replace?” The answer can be seen from the official document and can be found from the 2nd parameter showing other=nan. When the condition is False, the value specified in the 2nd parameter other (if any) will be used. If the 2nd parameter is not specified, it defaults to nan. Hence, in our case, we don’t specify the 2nd parameter for when the condition is False, nan will be used for the values. Therefore, same effect as we specify np.nan for the False condition in the np.where() call.