I have a database that tracks service and installation revenue for a construction company. I am trying to track technician turnovers, wherein a tech sets a lead for the salesman, and sales revenue. In this example, pulled from the database, Justin is the technician who created the lead; Sean is the salesman who went to the call and, in this case, sold it; and, finally, Victor is the one who installed the job. All salesman have 0 for their totals as the company doesn’t collect until the installation is completed. If the project_id == 0
then that means no lead was created off of the original call. A project number can span across at least 2 rows and at most 3 rows.
Example:
project_id emp_name client_name ... invoice_date total ... ... ... ... ... ... ... 15315 26173042 Justin Bill W ... 2021-03-26 169.95 ... ... ... ... ... ... ... 15322 26173042 Sean Bill W ... 2021-03-27 0 ... ... ... ... ... ... ... 15347 26173042 Victor Bill W ... 2021-04-01 17235 15348 0 Justin Jane D ... 2021-04-01 285 ... ... ... ... ... ... ...
I want to give Justin credit for setting up the call that leads to a sale of $17.2k, and I want to give credit to Sean for selling the job. I would prefer to create a new column, called sales_total
, that captures the sales figure.
The desired output:
project_id emp_name client_name ... invoice_date total sales_total ... ... ... ... ... ... ... ... 15315 26173042 Justin Bill W ... 2021-03-26 169.95 17235 ... ... ... ... ... ... ... ... 15322 26173042 Sean Bill W ... 2021-03-27 0 17235 ... ... ... ... ... ... ... ... 15347 26173042 Victor Bill W ... 2021-04-01 17235 17235 15348 0 Justin Jane D ... 2021-04-01 285 0 ... ... ... ... ... ... ... ...
I have tried using pd.groupby
, pd.sort_values
, and pd.iloc
; but with no success. I can’t quite figure out how to assign the proper value to the desired rows. If anyone knows a solution or could point me in the proper direction it would be much appreciated.
Answer
If you have this dataframe (supposing the records are sorted = the installation job is last):
index project_id emp_name client_name invoice_date total 0 15315 26173042 Justin Bill_W 2021-03-26 169.95 1 15322 26173042 Sean Bill_W 2021-03-27 0.00 2 15347 26173042 Victor Bill_W 2021-04-01 17235.00 3 15348 0 Justin Jane_D 2021-04-01 285.00
Then:
df["sales_total"] = df.groupby("project_id")["total"].transform("last") print(df)
Creates sales_total
column:
index project_id emp_name client_name invoice_date total sales_total 0 15315 26173042 Justin Bill_W 2021-03-26 169.95 17235.0 1 15322 26173042 Sean Bill_W 2021-03-27 0.00 17235.0 2 15347 26173042 Victor Bill_W 2021-04-01 17235.00 17235.0 3 15348 0 Justin Jane_D 2021-04-01 285.00 285.0