[Python, pandas]: get km run per day per vehicle

I am trying to output a table, where I can see the km run per day per vehicle, but the calculations I am getting is wrong.

to give a small snippet of the data I have. each vehicle sends its current odometer readout to a central server a few hundred times per day.

TS                          DATE               VEH          ODOMETER
2021-03-12 12:58:15.500     2021-03-12         008              2932
2021-03-12 00:00:21.700     2021-03-12         002             64253
2021-03-12 00:02:21.500     2021-03-12         002             64254
2021-03-12 00:03:41.400     2021-03-12         002             64255
2021-03-12 00:05:38.200     2021-03-12         002             64256
...                         ...                ...             ...
2021-03-12 23:55:88.100     2021-03-12         002             64953
2021-03-12 00:05:38.200     2021-03-13         002             64954

as it should be clear in above example, the first and last odometer readout for vehicle 2 on 2021-03-12 is 64953-64253 = 700 km but the first readout on the next day is 64954, so with grouping the diff() between the first and last odometer value for the day, some km is disappearing, as shown below

def grp_odo(dfObj):
    dfObj['ODOMETER'] = dfObj['ODOMETER'].astype(int)
    dfObj["km"] = dfObj.groupby(["DATE","VEH"])["ODOMETER"].diff()

    sum_km = dfObj.groupby(["DATE","VEH"], as_index=False)["km"].sum()

    return sum_km

dfodo = grp_odo(df[['DATE','VEH', 'ODOMETER']].loc[(~pd.isna(df['ODOMETER']))])

Print the groupby odometer difference per day:

print(dfodo)

            DATE WSTRPVID     km
0     2021-01-01      001  523.0
1     2021-01-01      002  700.0
2     2021-01-01      003  781.0
3     2021-01-01      004    2.0
4     2021-01-01      005  553.0
...          ...      ...    ...
3375  2021-04-09      034  802.0
3376  2021-04-09      035  615.0
3377  2021-04-09      036  778.0
3378  2021-04-09      038  425.0
3379  2021-04-09      039  386.0

print the sum the total km run by vehicle to from start to finish of the grouped data.

print(dfodo[dfodo.VEH== "002"].sum())
km: 36796

Print the sum of highest and lowest odometer value in dataframe

print(df[df.VEH== "002"].groupby('VEH')['ODOMETER'].agg(np.ptp))

VEH
002    36898
Name: ODOMETER, dtype: int64

I would like to have an output, where 2021-03-12 is 701km, so difference between first value of 2021-03-12 and first value of 2021-03-13, is this possible?

Answer

Input:

>>> df
                       TS       DATE  VEH  ODOMETER
0 2021-03-12 12:58:15.500 2021-03-12  008    2932.0
1 2021-03-12 00:00:21.700 2021-03-12  002   64253.0
2 2021-03-12 00:02:21.500 2021-03-12  002   64254.0
3 2021-03-12 00:03:41.400 2021-03-12  002   64255.0
4 2021-03-12 00:05:38.200 2021-03-12  002   64256.0
5 2021-03-12 23:55:48.100 2021-03-12  002   64953.0
6 2021-03-12 00:05:38.200 2021-03-13  002   64954.0

Output:

>>> df.assign(TOTAL=df.groupby("VEH")["ODOMETER"].shift(-1) - df["ODOMETER"])  
      .groupby(["DATE", "VEH"]).sum()["TOTAL"]
DATE        VEH
2021-03-12  002    701.0
            008      0.0
2021-03-13  002      0.0
Name: TOTAL, dtype: float64