# Cumulative sum in pandas starting with a zero and ending with the sum of all but the last entry respecting groups Code Answer

Hello Developer, Hope you guys are doing great. Today at Tutorial Guruji Official website, we are sharing the answer of Cumulative sum in pandas starting with a zero and ending with the sum of all but the last entry respecting groups without wasting too much if your time.

The question is published on by Tutorial Guruji team.

In the dataframe below, I want to create a new column C which will be the cumulative sum of B for each group in the A column, but these sums must start at zero and only add the values until the penultimate entry for that group.

A B
0 1 5
1 1 6
2 2 3
3 2 4
4 2 5
5 3 2
5 3 7
6 4 3

So, my result should be:

A B C
0 1 5 0
1 1 6 5
2 2 3 0
3 2 4 3
4 2 5 7
5 3 2 0
5 3 7 2
6 4 3 0

(I think this question is really obvious but somehow I couldn’t figure it out myself nor I could see it anywhere asked already.)

Another option is to use .groupby() twice, as follows:

Take the DataFrameGroupBy.shift() value of B under A so that for each group of A, the first entry of B will be reset and become NaN for later .fillna() to 0.

Further grouped by A for GroupBy.cumsum() within the local sequence of A to get the desired output:

df['C'] = (df.groupby('A')['B'].shift()
.groupby(df['A']).cumsum()
.fillna(0, downcast='infer')
)

This solution is vectorized as well as supporting non-contiguous groups too!

Result:

print(df)

A  B  C
0  1  5  0
1  1  6  5
2  2  3  0
3  2  4  3
4  2  5  7
5  3  2  0
5  3  7  2
6  4  3  0

## Edit

If you are to groupby more than one column and you got “KeyError”, check whether your syntax is correctly entered, for example:

If you groupby 2 columns year and income, you can use:

df['C'] = (df.groupby(['year', 'income'])['B'].shift()
.groupby([df['year'], df['income']]).cumsum()
.fillna(0, downcast='infer')
)

Pandas supports both syntax with or without quoting df passing parameter to .groupby(). However, for any groupby() that the entity being grouped is not df itself, we may not be able to use the abbreviated form to quote just the column labels only e.g. 'year', we need to use the full column name e.g. df['year'] instead.

We are here to answer your question about Cumulative sum in pandas starting with a zero and ending with the sum of all but the last entry respecting groups - If you find the proper solution, please don't forgot to share this with your team members.