I have a dataframe with data from ecommerce panel. It has orders and returns mixed together. Each row has orderID – it’s the same number for normal orders and for corresponding returns that come back from customers.
My data looks like this:
For each return I want to find a ‘Shop’ column value that corresponds to original order. For example : ‘orderID’ == 44 comes twice: once as return (with ‘Shop’ == 0) and once as normal order (with ‘Shop’ == 1). I want to replace all the 0 values with ‘Shop’ column with values from earlier orders
My desired output looks like this:
I know how to do it in Google Sheets (first I filter table removing ‘Shop’==0 values and then I vlookup for numbers in this filtered array)
I know how to filter this table using Pandas but I don’t know how to write it.
I assume that I will need to write a temporary column first, where I store both types of values – for normal orders (just copied) and for returns.
Original dataframe is 1 000 000+ rows
Thank you for any advice!
IIUC, using map:
m = df.query('Shop != 0').set_index('orderID')['Shop'] df['Shop'] = df['orderID'].map(m) print(df)
orderID Shop Revenue Note 0 44 1 -32 Return 1 45 3 -100 Return 2 44 1 14 NaN 3 45 3 20 Something else 4 46 2 50 NaN 5 47 1 80 Something 6 48 2 222 NaN
Create a pd.Series using
query to filter out zero shops then
map shops to orderID.
This works if there is a 1-1 shop to order mapping. If you have multiple shops per order, then you’ll need logic to determine which shop valid.
If you have duplicate order to the same shop, then you need to drop_duplicates first.