increase value by constant based on number of times a value occurs in another column

Question:

I have:

df=pd.DataFrame({'col1':['x','x','x','x','x','y','y','y','y','y','y','y'],
                 'value':[0,0,0,0,0,0,0,0,0,0,0,0]})

enter image description here

I would like:

enter image description here

the value column to increase by a constant value depending on the number of times it appears in col1. for each occurrence of x, it increases by 100, and for each occurrence of y it increases by 150

Asked By: frank

||

Answers:

We’ll start by getting the cumulative count for each item in col1:

df['value'] = df.groupby('col1').cumcount()

Next, we need to apply the multiplication by item:

multiples = {
    'x': 100,
    'y': 150
}
for col, value in multiples.items():
    index = df['col1'] == col
    df.loc[index,'value'] *= value

Giving the final result:

    col1    value
0   x   0
1   x   100
2   x   200
3   x   300
4   x   400
5   y   0
6   y   150
7   y   300
8   y   450
9   y   600
10  y   750
11  y   900
Answered By: SNygard

EDIT: SNygard beat me to it, but I try to present a solution that makes use of pandas’ broadcasting architecture and bypasses the inneficiencies of iteration.


It is said that when you iterate over a dataframe’s rows, you lose pandas’ efficiency by using it to a purpose it was not intended for.

Here’s how I would do it:

import pandas as pd

col1_to_value_hash = {
    'x': 100,
    'y': 150
}

df = pd.DataFrame({
    'col1': ['x', 'x', 'x', 'x', 'x', 'y', 'y', 'y', 'y', 'y', 'y', 'y'],
    'value': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
})

cumcount = df.groupby('col1').cumcount()


df['value'] = cumcount * df['col1'].apply(lambda x: col1_to_value_hash[x])

Answered By: Victor Valente
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.