Merge on nearest retrospective timestamp and fill forward in pandas

I am having difficulty mastering pandas special merge functions like merge_asof().

I have two dataframes: coords – pings from an EV gps, and info – other EV attributes such as navigation destination and battery level. My objective is to merge them such that the output dataframe row number equals the sum of both dataframes’ number of rows. For example:

coords.shape
(10, 3)

coords

ts                          lat       lng
2021-01-02 16:08:24.067971  58.3019 -134.4197
2021-01-06 12:54:18.535681  58.3021 -134.4195
2021-01-08 22:15:35.036423  58.3025 -134.4195
2021-01-16 01:10:39.610540  58.3029 -134.4193
2021-01-27 12:28:45.202376  58.3030 -134.4197
2021-01-30 05:32:09.404525  58.3031 -134.4190
2021-02-08 10:39:19.686159  58.3033 -134.4187
2021-02-15 01:30:16.733921  58.3039 -134.4187
2021-02-16 12:49:55.366025  58.3040 -134.4185
2021-02-19 23:57:57.369978  58.3041 -134.4181


info.shape
(3, 3)

info

ts                          nav_to  battery
2021-01-26 12:47:52.972586  Juneau      90
2021-02-14 23:23:18.186058  Anchorage   50
2021-02-19 07:26:35.357977  Fairbanks   30

info and coord should be merged so that the timestamps ts are in consecutive order, and so that info rows should be matched to rows in coords with the nearest timestamp that comes “before”. Lastly, nav_to , battery, lat and lng should should be filled forward. The output from the examples above would be:

output
ts                          lat      lng        nav_to  battery
2021-01-02 16:08:24.067971  58.3019 -134.4197   None    NaN
2021-01-06 12:54:18.535681  58.3021 -134.4195   None    NaN
2021-01-08 22:15:35.036423  58.3025 -134.4195   None    NaN
2021-01-16 01:10:39.610540  58.3029 -134.4193   None    NaN
2021-01-26 12:47:52.972586  58.3029 -134.4193   Juneau  90.0
2021-01-27 12:28:45.202376  58.3030 -134.4197   Juneau  90.0
2021-01-30 05:32:09.404525  58.3031 -134.4190   Juneau  90.0
2021-02-08 10:39:19.686159  58.3033 -134.4187   Juneau  90.0
2021-02-14 23:23:18.186058  58.3033 -134.4187   Anchorage   50.0
2021-02-15 01:30:16.733921  58.3039 -134.4187   Anchorage   50.0
2021-02-16 12:49:55.366025  58.3040 -134.4185   Anchorage   50.0
2021-02-19 07:26:35.357977  58.3040 -134.4185   Fairbanks   30.0
2021-02-19 23:57:57.369978  58.3041 -134.4181   Fairbanks   30.0

I have tried with pd.merge_asof(coords, info, on="ts", direction="forward") but this does not produce the correct result, it fills backwards and only keeps records from coords. What are the correct commands to produce the desired result in pandas?

Answer

Try with the default direction='backward', then concat with the second dataframe

(pd.concat([pd.merge_asof(df1, df2, on='ts'), df2])
   .sort_values('ts')
)

Output:

                          ts      lat       lng     nav_to  battery
0 2021-01-02 16:08:24.067971  58.3019 -134.4197        NaN      NaN
1 2021-01-06 12:54:18.535681  58.3021 -134.4195        NaN      NaN
2 2021-01-08 22:15:35.036423  58.3025 -134.4195        NaN      NaN
3 2021-01-16 01:10:39.610540  58.3029 -134.4193        NaN      NaN
0 2021-01-26 12:47:52.972586      NaN       NaN     Juneau     90.0
4 2021-01-27 12:28:45.202376  58.3030 -134.4197     Juneau     90.0
5 2021-01-30 05:32:09.404525  58.3031 -134.4190     Juneau     90.0
6 2021-02-08 10:39:19.686159  58.3033 -134.4187     Juneau     90.0
1 2021-02-14 23:23:18.186058      NaN       NaN  Anchorage     50.0
7 2021-02-15 01:30:16.733921  58.3039 -134.4187  Anchorage     50.0
8 2021-02-16 12:49:55.366025  58.3040 -134.4185  Anchorage     50.0
2 2021-02-19 07:26:35.357977      NaN       NaN  Fairbanks     30.0
9 2021-02-19 23:57:57.369978  58.3041 -134.4181  Fairbanks     30.0

You can then choose to bfill the lat and lng columns. Or you can just merge_asof twice:

(pd.concat([pd.merge_asof(df1, df2, on='ts'), 
            pd.merge_asof(df2, df1, on='ts')
           ])
   .sort_values('ts')
)

Output:

                          ts      lat       lng     nav_to  battery
0 2021-01-02 16:08:24.067971  58.3019 -134.4197        NaN      NaN
1 2021-01-06 12:54:18.535681  58.3021 -134.4195        NaN      NaN
2 2021-01-08 22:15:35.036423  58.3025 -134.4195        NaN      NaN
3 2021-01-16 01:10:39.610540  58.3029 -134.4193        NaN      NaN
0 2021-01-26 12:47:52.972586  58.3029 -134.4193     Juneau     90.0
4 2021-01-27 12:28:45.202376  58.3030 -134.4197     Juneau     90.0
5 2021-01-30 05:32:09.404525  58.3031 -134.4190     Juneau     90.0
6 2021-02-08 10:39:19.686159  58.3033 -134.4187     Juneau     90.0
1 2021-02-14 23:23:18.186058  58.3033 -134.4187  Anchorage     50.0
7 2021-02-15 01:30:16.733921  58.3039 -134.4187  Anchorage     50.0
8 2021-02-16 12:49:55.366025  58.3040 -134.4185  Anchorage     50.0
2 2021-02-19 07:26:35.357977  58.3040 -134.4185  Fairbanks     30.0
9 2021-02-19 23:57:57.369978  58.3041 -134.4181  Fairbanks     30.0

Leave a Reply

Your email address will not be published. Required fields are marked *