Python Pandas Fast Way to Divide Row Value by Previous Value

I want to calculate daily bond returns from clean prices based on the logarithm of the bond price in t divided by the bond price in t-1. So far, I calculate it like this:

import pandas as pd
import numpy as np
#create example data
col1 = np.random.randint(0,10,size=10)
df = pd.DataFrame()
df["col1"] = col1
df["result"] = [0]*len(df)
#slow computation
for i in range(len(df)):
    if i == 0:
        df["result"][i] = np.nan
    else:
        df["result"][i] = np.log(df["col1"][i]/df["col1"][i-1])

However, since I have a large sample this takes a lot of time to compute. Is there a way to improve the code in order to make it faster?

Answer

Use Series.shift by col1 column with Series.div for division:

df["result1"] = np.log(df["col1"].div(df["col1"].shift()))
#alternative
#df["result1"] = np.log(df["col1"] / df["col1"].shift())
print (df)
   col1    result   result1
0     5       NaN       NaN
1     0      -inf      -inf
2     3       inf       inf
3     3  0.000000  0.000000
4     7  0.847298  0.847298
5     9  0.251314  0.251314
6     3 -1.098612 -1.098612
7     5  0.510826  0.510826
8     2 -0.916291 -0.916291
9     4  0.693147  0.693147

I test both solutions:

np.random.seed(0)

col1 = np.random.randint(0,10,size=10000)
df = pd.DataFrame({'col1':col1})


In [128]: %timeit df["result1"] = np.log(df["col1"] / df["col1"].shift())
865 µs ± 139 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [129]: %timeit df.assign(result=lambda x: np.log(x.col1.pct_change() + 1))
1.16 ms ± 11.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [130]: %timeit df["result1"] = np.log(df["col1"].pct_change() + 1)
1.03 ms ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

np.random.seed(0)

col1 = np.random.randint(0,10,size=100000)
df = pd.DataFrame({'col1':col1})

In [132]: %timeit df["result1"] = np.log(df["col1"] / df["col1"].shift())
3.7 ms ± 189 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [133]: %timeit df.assign(result=lambda x: np.log(x.col1.pct_change() + 1))
6.31 ms ± 545 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [134]: %timeit df["result1"] = np.log(df["col1"].pct_change() + 1)
3.75 ms ± 269 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)