append sequence number with padded zeroes to a series using padas

I have a dataframe like as shown below

df = pd.DataFrame({'person_id': [101,101,101,101,202,202,202],
                        'login_date':['5/7/2013 09:27:00 AM','09/08/2013 11:21:00 AM','06/06/2014 08:00:00 AM','06/06/2014 05:00:00 AM','12/11/2011 10:00:00 AM','13/10/2012 12:00:00 AM','13/12/2012 11:45:00 AM']})
df.login_date = pd.to_datetime(df.login_date)
df['logout_date'] = df.login_date + pd.Timedelta(days=5)
df['login_id'] = [1,1,1,1,8,8,8]

As you can see in the sample dataframe, the login_id is the same even though login and logout dates are different for the person.

For example, person = 101, has logged in and out at 4 different timestamps. but he has got the same login_ids which is incorrect.

Instead, I would like to generate a new login_id column where each person gets a new login_id but retains the 1st login_id information in their subsequent logins. So, we can know its a sequence

I tried the below but it doesn’t work well

df.groupby(['person_id','login_date','logout_date'])['login_id'].rank(method="first", ascending=True) + 100000

I expect my output to be like as shown below. You can see how 1 and 8, the 1st login_id for each person is retained in their subsequent login_ids. We just add a sequence by adding 00001 and plus one based on number of rows.

Please note I would like to apply this on a big data and the login_ids may not just be single digit in real data. For ex, 1st login_id could even be 576869578 etc kind of random number. In that case, the subsequent login id will be 57686957800001. Hope this helps. Whatever is the 1st login_id for that subject, add 00001, 00002 etc based on the number of rows that person has. Hope this helps

enter image description here

Answer

Update 2: Just realized my previous answers also added 100000 to the first index. Here is a version that uses GroupBy.transform() to add 100000 only to subsequent indexes:

cumcount = df.groupby(['person_id','login_id']).login_id.cumcount()
df.login_id = df.groupby(['person_id','login_id']).login_id.transform(
    lambda x: x.shift().mul(100000).fillna(x.min())
).add(cumcount)

    person_id           login_date          logout_date  login_id
# 0       101  2013-05-07 09:27:00  2013-05-12 09:27:00         1
# 1       101  2013-09-08 11:21:00  2013-09-13 11:21:00    100001
# 2       101  2014-06-06 08:00:00  2014-06-11 08:00:00    100002
# 3       101  2014-06-06 05:00:00  2014-06-11 05:00:00    100003
# 4       202  2011-12-11 10:00:00  2011-12-16 10:00:00         8
# 5       202  2012-10-13 00:00:00  2012-10-18 00:00:00    800001
# 6       202  2012-12-13 11:45:00  2012-12-18 11:45:00    800002

Update: Faster option is to build the sequence with GroupBy.cumcount():

cumcount = df.groupby(['person_id','login_id']).login_id.cumcount()
df.login_id = df.login_id.mul(100000).add(cumcount)

#   person_id           login_date          logout_date  login_id
# 0       101  2013-05-07 09:27:00  2013-05-12 09:27:00    100000
# 1       101  2013-09-08 11:21:00  2013-09-13 11:21:00    100001
# 2       101  2014-06-06 08:00:00  2014-06-11 08:00:00    100002
# 3       101  2014-06-06 05:00:00  2014-06-11 05:00:00    100003
# 4       202  2011-12-11 10:00:00  2011-12-16 10:00:00    800000
# 5       202  2012-10-13 00:00:00  2012-10-18 00:00:00    800001
# 6       202  2012-12-13 11:45:00  2012-12-18 11:45:00    800002

You can build the sequence in a GroupBy.apply():

df.login_id = df.groupby(['person_id','login_id']).login_id.apply(
    lambda x: pd.Series([x.min()*100000+seq for seq in range(len(x))], x.index)
)