Assigning Values Depending on a Condition

I have a database that tracks service and installation revenue for a construction company. I am trying to track technician turnovers, wherein a tech sets a lead for the salesman, and sales revenue. In this example, pulled from the database, Justin is the technician who created the lead; Sean is the salesman who went to the call and, in this case, sold it; and, finally, Victor is the one who installed the job. All salesman have 0 for their totals as the company doesn’t collect until the installation is completed. If the project_id == 0 then that means no lead was created off of the original call. A project number can span across at least 2 rows and at most 3 rows.

Example:

         project_id    emp_name     client_name     ...    invoice_date    total
...         ...          ...            ...         ...        ...          ...
15315     26173042      Justin         Bill W       ...     2021-03-26     169.95
...         ...          ...            ...         ...        ...          ...
15322     26173042      Sean           Bill W       ...     2021-03-27       0
...         ...          ...            ...         ...        ...          ...
15347     26173042      Victor         Bill W       ...     2021-04-01     17235
15348        0          Justin         Jane D       ...     2021-04-01      285
...         ...          ...            ...         ...        ...          ...

I want to give Justin credit for setting up the call that leads to a sale of $17.2k, and I want to give credit to Sean for selling the job. I would prefer to create a new column, called sales_total, that captures the sales figure.

The desired output:

         project_id    emp_name     client_name     ...    invoice_date    total    sales_total
...         ...          ...            ...         ...        ...          ...         ...
15315     26173042      Justin         Bill W       ...     2021-03-26     169.95      17235
...         ...          ...            ...         ...        ...          ...         ...
15322     26173042      Sean           Bill W       ...     2021-03-27       0         17235
...         ...          ...            ...         ...        ...          ...         ...
15347     26173042      Victor         Bill W       ...     2021-04-01     17235       17235
15348        0          Justin         Jane D       ...     2021-04-01      285          0
...         ...          ...            ...         ...        ...          ...         ... 

I have tried using pd.groupby, pd.sort_values, and pd.iloc; but with no success. I can’t quite figure out how to assign the proper value to the desired rows. If anyone knows a solution or could point me in the proper direction it would be much appreciated.

Answer

If you have this dataframe (supposing the records are sorted = the installation job is last):

   index  project_id emp_name client_name invoice_date     total
0  15315    26173042   Justin      Bill_W   2021-03-26    169.95
1  15322    26173042     Sean      Bill_W   2021-03-27      0.00
2  15347    26173042   Victor      Bill_W   2021-04-01  17235.00
3  15348           0   Justin      Jane_D   2021-04-01    285.00

Then:

df["sales_total"] = df.groupby("project_id")["total"].transform("last")
print(df)

Creates sales_total column:

   index  project_id emp_name client_name invoice_date     total  sales_total
0  15315    26173042   Justin      Bill_W   2021-03-26    169.95      17235.0
1  15322    26173042     Sean      Bill_W   2021-03-27      0.00      17235.0
2  15347    26173042   Victor      Bill_W   2021-04-01  17235.00      17235.0
3  15348           0   Justin      Jane_D   2021-04-01    285.00        285.0