Pandas DataFrame DateTimes vs Timestamps

I have a DataFrame with a single day column:

|     | day                       |
|----:|:--------------------------|
|   0 | 2021-08-28 00:00:00+00:00 |
|   1 | 2021-08-28 02:00:00+00:00 |
|   2 | 2021-08-28 04:00:00+00:00 |
| ... |                       ... |
|   n | 2021-08-28 16:00:00+00:00 |

>>> df.dtypes
day    datetime64[ns, UTC]
dtype: object

I noticed pandas returns different date data-types when sampling and indexing and have to be converted to be compared.

Index Query

>>> df.day[0]
Timestamp('2021-08-28 00:00:00+0000', tz='UTC')

>>> type(df.day[0])
pandas._libs.tslibs.timestamps.Timestamp

Sample Query

>>> df.day.sample(1).values[0]
numpy.datetime64('2021-09-04T12:00:00.000000000')

>>> type(df.day.sample(1).values[0])
numpy.datetime64

What’s going on? Why does pandas use different data-types in the two scenarios?

  • Python: 3.8.10
  • Pandas: 1.2.5

Answer

Pandas stores datetimes as numpy’s underlying datetime64 type. The reason (as opposed to storing as a Timestamp, which is a datetime.datetime subclass) is simple – performance. When retrieving a particular value though, it returns a Timestamp object, which is more convenient to work with since it support all datetime.datetime methods.