How to add new column to a dataframe and fill its values based on condition in python

Question:

So, I have this table with company names and the value of each order they ordered

Order Id Company Id Company Name Date Order Value
3455 80EYLOKP9E762WKG Chimera-Chasing 18-02-2017 2345
4875 TLEXR1HZWTUTBHPB Mellow Ezra 30-07-2015 3245
8425 839FKFW2LLX4LMBB Chimera-Chasing 27-05-2016 4566
4837 97OX39BGVMHODLJM Worst Mali 27-09-2018 5674
3434 5T4LGH4XGBWOD49Z Indonesian Grigory 14-01-2016 7654

And, I need to add a new column which will include the segment of each company based on their total orders value

I decided to divide them into 4 segments (Prime, Platinum, Gold, Silver)

So, my approach was to first aggregate this table into a new table with total orders value for each company

with this code:

seg = orders.loc[:,['Company Name', 'Order Value']].groupby('Company Name').sum()

Outcome:

Company Name Order Value
’48 Wills 65325
10-Day Causes 85473
10-Hour Leak 83021
Youngish Mark’S 120343
10-Year-Old Alba 97968

Then, I used conditions to create new column with segments based on total orders value and added this column to the aggregated data frame "seg"

with this code

conditions = [
    (seg['Order Value'] >= 124485),
    (seg['Order Value'] >= 105503) & (seg['Order Value'] < 124485),
    (seg['Order Value'] >= 88174) & (seg['Order Value'] < 105503),
    (seg['Order Value'] < 88174)
                 ]

values = ['Prime', 'Platinum', 'Gold', 'Silver']

seg['Segment'] = np.select(conditions, values)

Now, I need to add this segment column to the original dataframe (orders) with a condition where company name in seg match company name in orders
but I dont know how to do that

Asked By: mostafa ibrahim

||

Answers:

I believe what you are wanting is pd.merge (see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html):

orders = orders.merge(seg, on=['Company Name'], how='left')

Note that you will have a duplicated ‘Order Value’ column in your dataframe. To fix this I would include the following line in before the merge code:

seg = seg.rename(columns={'Order Value': 'Total Order Value'})

Full example:

import pandas as pd
import numpy as np

data = {
    'Order ID': ['3455', '4875', '8425', '4837', '3434'],
    'Company ID': ['80EYLOKP9E762WKG', 'TLEXR1HZWTUTBHPB', '839FKFW2LLX4LMBB', '97OX39BGVMHODLJM', '5T4LGH4XGBWOD49Z'],
    'Company Name': ['Chimera-Chasing', 'Mellow Ezra', 'Chimera-Chasing', 'Worst Mali', 'Indonesian Grigory'],
    'Date': ['18-02-2017', '30-07-2015', '27-05-2016', '27-09-2018', '14-01-2016'   ],
    'Order Value': [2345, 3245, 4566, 5674, 7654]
}

orders = pd.DataFrame(data = data)
seg = orders.loc[:,['Company Name', 'Order Value']].groupby('Company Name').sum()

conditions = [
    (seg['Order Value'] >= 124485),
    (seg['Order Value'] >= 105503) & (seg['Order Value'] < 124485),
    (seg['Order Value'] >= 88174) & (seg['Order Value'] < 105503),
    (seg['Order Value'] < 88174)
                 ]

values = ['Prime', 'Platinum', 'Gold', 'Silver']

seg['Segment'] = np.select(conditions, values)
seg = seg.rename(columns={'Order Value': 'Total Order Value'})

orders = orders.merge(seg, on=['Company Name'], how='left')

print(orders)
  Order ID        Company ID        Company Name        Date  Order Value  Total Order Value Segment
0     3455  80EYLOKP9E762WKG     Chimera-Chasing  18-02-2017         2345               6911  Silver
1     4875  TLEXR1HZWTUTBHPB         Mellow Ezra  30-07-2015         3245               3245  Silver
2     8425  839FKFW2LLX4LMBB     Chimera-Chasing  27-05-2016         4566               6911  Silver
3     4837  97OX39BGVMHODLJM          Worst Mali  27-09-2018         5674               5674  Silver
4     3434  5T4LGH4XGBWOD49Z  Indonesian Grigory  14-01-2016         7654               7654  Silver

You can delete the ‘Total Order Value’ column with the following line if you do not want it:

orders = orders.drop(labels=['Total Order Value'], axis=1)
Answered By: Michael Castle
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.