Pandas compare value with previous row with filtration condition Code Answer

Hello Developer, Hope you guys are doing great. Today at Tutorial Guruji Official website, we are sharing the answer of Pandas compare value with previous row with filtration condition without wasting too much if your time.

The question is published on by Tutorial Guruji team.

I have a DataFrame with information about employee salary. It’s about 900000+ rows.

Sample:

+----+-------------+---------------+----------+
|    |   table_num | name          |   salary |
|----+-------------+---------------+----------|
|  0 |      001234 | John Johnson  |     1200 |
|  1 |      001234 | John Johnson  |     1000 |
|  2 |      001235 | John Johnson  |     1000 |
|  3 |      001235 | John Johnson  |     1200 |
|  4 |      001235 | John Johnson  |     1000 |
|  5 |      001235 | Steve Stevens |     1000 |
|  6 |      001236 | Steve Stevens |     1200 |
|  7 |      001236 | Steve Stevens |     1200 |
|  8 |      001236 | Steve Stevens |     1200 |
+----+-------------+---------------+----------+

dtypes:

table_num: string
name: string
salary: float

I need to add a column with information about increaseddecreased salary level. I’m using the shift() function to compare value in rows.

Main problem is in filtration and iteration over all unique employees over the whole dataset.

It takes about 3 and half hour in my script.

How to do it faster?

My script:

# giving us only unique combination of 'table_num' and 'name'
    # since there can be same 'table_num' for different 'name'
    # and same names with different 'table_num' appears sometimes

names_df = df[['table_num', 'name']].drop_duplicates()

# then extracting particular name and table_num from Series
for i in range(len(names_df)):    ### Bottleneck of whole script ###    
    t = names_df.iloc[i,[0,1]][0]
    n = names_df.iloc[i,[0,1]][1]

    # using shift() and lambda to check if there difference between two rows 
    diff_sal = (df[(df['table_num']==t)
               & ((df['name']==n))]['salary'] - df[(df['table_num']==t)
                                                 & ((df['name']==n))]['salary'].shift(1)).apply(lambda x: 1 if x>0 else (-1 if x<0 else 0))
    df.loc[diff_sal.index, 'inc'] = diff_sal.values

Sample input data:

df = pd.DataFrame({'table_num': ['001234','001234','001235','001235','001235','001235','001236','001236','001236'], 
                     'name': ['John Johnson','John Johnson','John Johnson','John Johnson','John Johnson', 'Steve Stevens', 'Steve Stevens', 'Steve Stevens', 'Steve Stevens'], 
                     'salary':[1200.,1000.,1000.,1200.,1000.,1000.,1200.,1200.,1200.]})

Sample output:

+----+-------------+---------------+----------+-------+
|    |   table_num | name          |   salary |   inc |
|----+-------------+---------------+----------+-------|
|  0 |      001234 | John Johnson  |     1200 |     0 |
|  1 |      001234 | John Johnson  |     1000 |    -1 |
|  2 |      001235 | John Johnson  |     1000 |     0 |
|  3 |      001235 | John Johnson  |     1200 |     1 |
|  4 |      001235 | John Johnson  |     1000 |    -1 |
|  5 |      001235 | Steve Stevens |     1000 |     0 |
|  6 |      001236 | Steve Stevens |     1200 |     0 |
|  7 |      001236 | Steve Stevens |     1200 |     0 |
|  8 |      001236 | Steve Stevens |     1200 |     0 |
+----+-------------+---------------+----------+-------+

Answer

Use groupby together with diff:

df['inc'] = df.groupby(['table_num', 'name'])['salary'].diff().fillna(0.0)
df.loc[df['inc'] > 0.0, 'inc'] = 1.0
df.loc[df['inc'] < 0.0, 'inc'] = -1.0
We are here to answer your question about Pandas compare value with previous row with filtration condition - If you find the proper solution, please don't forgot to share this with your team members.

Related Posts

Tutorial Guruji