How to find trend of record count over hours of the day on a pandas dataframe?

I have records flowing in at all times of the day. I want to find out the hours having high volume of traffic. The dataframe has a timestamp column for aggregating over.

Sample:

device_id,created_at
6ca0747be53ab2ec,2021-08-25 18:44:14.920807
594856a5fe3e6be1,2021-08-25 19:07:27.124797
73a1a3a1a71d9f82,2021-08-26 05:47:28.962928
ad173e9c63e61dbf,2021-08-26 06:41:42.079098
bb3dbea86bc454aa,2021-08-26 07:37:31.961397
88fd5639c1a56c8d,2021-08-26 07:41:22.922490
ee98aa89a11356e1,2021-08-26 12:19:23.279145
857021fd0eaa4c90,2021-08-26 13:10:48.756936
5d6e44174413a683,2021-08-26 16:41:31.043344
533c500b5157108,2021-08-26 21:19:54.259458
bcc245d388fc3a68,2021-08-27 02:39:24.644168
3736534393362cb3,2021-08-27 04:37:20.667116
e42b2f6c95fa1aa6,2021-08-27 05:07:18.478456
ebac58cac78b3bff,2021-08-27 06:16:22.195718
48b1f777c630c2b0,2021-08-27 08:17:01.569384
6feeee41b086d11c,2021-08-27 12:24:07.861680
fa3e4345fb55b02f,2021-08-27 14:25:07.535398
38e2b3257729ff11,2021-08-27 16:32:51.820661
80d0161bf1f6031b,2021-08-27 16:41:17.051059
8e1948470da25661,2021-08-27 17:11:30.639900
a43da665fbc53a27,2021-08-28 02:32:01.094762
fd016db27f85f7fb,2021-08-28 03:31:01.735643
78150257d277ebc3,2021-08-28 04:45:56.973692

Required result:

hour,traffic
0,0
1,0
2,1
3,1
4,1
5,1
6,1
7,2
8,1
9,0
10,0
11,0
12,1
13,1
14,1
15,0
16,1.5
17,1
18,1
19,1
20,0
21,1
22,0
23,0

I got as far as df.groupby( [df["created_at"].dt.hour, df["created_at"].dt.date] ).count(), but how to perform the aggregation?

Answer

You need to groupby twice, once to get the counts, once to get the mean of counts:

(df.groupby( [df["created_at"].dt.hour, df["created_at"].dt.date])
   ['device_id'].count()
   .groupby("created_at").mean()
   .sort_values(ascending=False)
)

output:

created_at
7     2.0
16    1.5
2     1.0
3     1.0
4     1.0
5     1.0
6     1.0
8     1.0
12    1.0
13    1.0
14    1.0
17    1.0
18    1.0
19    1.0
21    1.0

alternative

Here is another alternative with groupby + unstack:

(df.assign(date=df["created_at"].dt.date, hour=df["created_at"].dt.hour)
   .set_index(['date', 'hour'])['device_id']
   .groupby(level=[0,1]).count()
   .unstack(level=0).mean(axis=1)
   .sort_values(ascending=False)
)