Assign new column in DataFrame based on if value is in a certain value range

Question:

I have two DataFrames as follows:

df_discount = pd.DataFrame(data={'Graduation' : np.arange(0,1000,100), 'Discount %' : np.arange(0,50,5)})
df_values = pd.DataFrame(data={'Sum' : [20,801,972,1061,1251]})

enter image description here enter image description here

Now my goal is to get a new column df_values[‘New Sum’] for my df_values that applies the corresponding discount to df_values[‘Sum’] based on the value of df_discount[‘Graduation’]. If the Sum is >= the Graduation the corresponding discount is applied.

Examples: Sum 801 should get a discount of 40% resulting in 480.6, Sum 1061 gets 45% resulting in 583.55.

I know I could write a funtion with if else conditions and the returning values. However, is there a better way to do this if you have very many different conditions?

Asked By: Minfetli

||

Answers:

You can use pandas.DataFrame.mask. Basically if your condition is true it replaces the value. But for that your sum column has to be inside first dataframe.

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mask.html

Answered By: Lanre

You could try if pd.merge_asof() works for you:

df_discount = pd.DataFrame({
    'Graduation': np.arange(0, 1000, 100), 'Discount %': np.arange(0, 50, 5)
})
df_values = pd.DataFrame({'Sum': [20, 100, 101, 350, 801, 972, 1061, 1251]})

df_values = (
    pd.merge_asof(
        df_values, df_discount,
        left_on="Sum", right_on="Graduation",
        direction="backward"
    )
    .assign(New_Sum=lambda df: df["Sum"] * (1 - df["Discount %"] / 100))
    .drop(columns=["Graduation", "Discount %"])
)

Result (without the last .drop(columns=...) to see what’s happening):

    Sum  Graduation  Discount %  New_Sum
0    20           0           0    20.00
1   100         100           5    95.00
2   101         100           5    95.95
3   350         300          15   297.50
4   801         800          40   480.60
5   972         900          45   534.60
6  1061         900          45   583.55
7  1251         900          45   688.05
Answered By: Timus

pandas.cut() is made for problems like this where you need to segment your data into bins (i.e. discount % based on value range).

First define the column, the ranges, and the corresponding bins.

# The column we need to segment
col = df_values['Sum']

# The ranges: [0, 100, 200,... ,900, np.inf] means (0,100), (100,200), ...  (900,inf) 
graduation = np.append(df_discount['Graduation'], np.inf)

# For each range what is the corresponding bin (i.e. discount)
discount = df_discount['Discount %']

Now call pandas.cut() and do the discount calculation.

df_values['Discount %'] = pd.cut(col,
                                 graduation,
                                 labels=discount)

# Convert the string label to an int for calculation
df_values['Discount %'] = df_values['Discount %'].astype(int)
df_values['New Sum'] = df_values['Sum'] * (1-df_values['Discount %']/100)

    Sum  Discount %  New Sum
0    20           0    20.00
1   801          40   480.60
2   972          45   534.60
3  1061          45   583.55
4  1251          45   688.05
Answered By: viggnah