Converting Pandas DataFrame dates so that I can pick out particular dates

I have two dataframes with particular data that I’m needing to merge.

        Date  Greenland  Antarctica
0    2002.29       0.00        0.00
1    2002.35      68.72       19.01
2    2002.62    -219.32      -59.36
3    2002.71    -242.83       46.55
4    2002.79    -209.12       63.31
..       ...        ...         ...
189  2020.79   -4928.78    -2542.18
190  2020.87   -4922.47    -2593.06
191  2020.96   -4899.53    -2751.98
192  2021.04   -4838.44    -3070.67
193  2021.12   -4900.56    -2755.94

[194 rows x 3 columns] 

and

             Date  Mean Sea Level
0     1993.011526          -38.75
1     1993.038692          -39.77
2     1993.065858          -39.61
3     1993.093025          -39.64
4     1993.120191          -38.72
...           ...             ...
1021  2020.756822           62.83
1022  2020.783914           62.93
1023  2020.811006           62.98
1024  2020.838098           63.00
1025  2020.865190           63.00

[1026 rows x 2 columns]

My ultimate goal is trying to pull out the data from the second data frame(Mean Sea Level column) that comes from (roughly) the same time frame as the dates in the first dataframe, and then merge that back in with the first data frame.

However, the only ways that I can think of selecting out certain dates, involves first converting all of the dates in the Date columns of both Dataframes to something Pandas recognizes, but I have been unable to figure our how to do that. I figured out some code(below) that can convert individual dates to a more common date format, but its been difficult to successfully apply it to all of the Dates in dataframe. Also I’m not sure I can then get Pandas to then convert that to a date format that Pandas recognizes.

from datetime import datetime

def fraction2datetime(year_fraction: float) -> datetime:
    year = int(year_fraction)
    fraction = year_fraction - year
    first = datetime(year, 1, 1)
    aux = datetime(year + 1, 1, 1)
    return first + (aux - first)*fraction

I also looked at pandas.to_datetime but I don’t see a way to have it read the format the dates are initially in.

So does anyone have any guidance on this? Firstly with the conversion of dates, but also with the task of picking out the dates from the second dataframe if possible. Any help would be greatly appreciated.

Answer

Suppose you have this 2 dataframes:

df1:

      Date  Greenland  Antarctica
0  2020.79   -4928.78    -2542.18
1  2020.87   -4922.47    -2593.06
2  2020.96   -4899.53    -2751.98
3  2021.04   -4838.44    -3070.67
4  2021.12   -4900.56    -2755.94

df2:

          Date  Mean Sea Level
0  2020.756822           62.83
1  2020.783914           62.93
2  2020.811006           62.98
3  2020.838098           63.00
4  2020.865190           63.00

To convert the dates:

def fraction2datetime(year_fraction: float) -> datetime:
    year = int(year_fraction)
    fraction = year_fraction - year
    first = datetime(year, 1, 1)
    aux = datetime(year + 1, 1, 1)
    return first + (aux - first) * fraction


df1["Date"] = df1["Date"].apply(fraction2datetime)
df2["Date"] = df2["Date"].apply(fraction2datetime)
print(df1)
print(df2)

Prints:

                        Date  Greenland  Antarctica
0 2020-10-16 03:21:35.999999   -4928.78    -2542.18
1 2020-11-14 10:04:47.999997   -4922.47    -2593.06
2 2020-12-17 08:38:24.000001   -4899.53    -2751.98
3 2021-01-15 14:23:59.999999   -4838.44    -3070.67
4 2021-02-13 19:11:59.999997   -4900.56    -2755.94
                        Date  Mean Sea Level
0 2020-10-03 23:55:28.012795           62.83
1 2020-10-13 21:54:02.073603           62.93
2 2020-10-23 19:52:36.134397           62.98
3 2020-11-02 17:51:10.195198           63.00
4 2020-11-12 15:49:44.255992           63.00

For the join, you can use pd.merge_asof. For example this will join on nearest date within 30-day tolerance (you can tweak these values as you want):

x = pd.merge_asof(
    df1, df2, on="Date", tolerance=pd.Timedelta(days=30), direction="nearest"
)
print(x)

Will print:

                        Date  Greenland  Antarctica  Mean Sea Level
0 2020-10-16 03:21:35.999999   -4928.78    -2542.18           62.93
1 2020-11-14 10:04:47.999997   -4922.47    -2593.06           63.00
2 2020-12-17 08:38:24.000001   -4899.53    -2751.98             NaN
3 2021-01-15 14:23:59.999999   -4838.44    -3070.67             NaN
4 2021-02-13 19:11:59.999997   -4900.56    -2755.94             NaN