Integration of pandas dataframe yields an array with a different length. How to store this in the same dataframe?

I have a pandas dataframe, e.g.

    x  y
0   0  3
1   1  3
2   2  2
3   4  3
4   5  4
5   7  3
6   8  1
7  10  2

Now I want to calculate the integral of these values for each data point using scipy.integrate.cumtrapz. If I run

>>> cumtrapz(df.x,df.y)
array([  0. ,  -1.5,   1.5,   6. ,   0. , -15. ,  -6. ])

I get the values I want, but I can’t insert it into the dataframe:

>>> df["z"] = cumtrapz(df.x,df.y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/steen/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 3044, in __setitem__
    self._set_item(key, value)
  File "/home/steen/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 3120, in _set_item
    value = self._sanitize_column(key, value)
  File "/home/steen/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 3768, in _sanitize_column
    value = sanitize_index(value, self.index)
  File "/home/steen/.local/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 747, in sanitize_index
    raise ValueError(
ValueError: Length of values (7) does not match length of index (8)

Because, as can be seen above, the length of the output array is one less than the original dataframe.

How do I accomplish this?

Answer

The Scipy docs address this issue here:

scipy.integrate.cumtrapz

Quote:


initial : scalar, optional

If given, uses this value as the first value in the returned result. Typically this value should be 0. Default is None, which means no value at x[0] is returned and res has one element less than y along the axis of integration.


Example:

cumtrapz(df.x,df.y, initial=0)

Output:

array([  0. ,   0. ,  -1.5,   1.5,   6. ,   0. , -15. ,  -6. ])