How to calculate relative values based on position of reference flag

Question:

I have some simple table with IsReference column defining the flag indicating the base value of Pmpp based on which relative values for all other values of Pmpp grouped by Item should be calculated, as shown in the picture below. Similarly, I could calculate the difference between dates based on reference date, etc. I would appreciate for hints how to do that in Python. Below there is a code that I started with.
Best regards

enter image description here

import pandas as pd

d = {'Item': ["dmc1", "dmc1", "dmc1", "dmc1", "dmc2", "dmc2", "dmc2", "dmc2"], 
 'Pmpp': [3, 4, 3, 1, 2, 4, 3, 1],
 'IsReference': [0, 1, 0, 0, 1, 0, 0, 0],
 'TimeStamp': ["22.02.2023", "25.02.2023", "28.02.2023", "3.03.2023", "24.02.2023", "25.02.2023", "2.03.2023", "5.03.2023"]
}

df = pd.DataFrame(data = d)

# find location of reference
ref = df['Pmpp'][df['IsReference'] == 1].values

# calculate relative values
df['Pmpp_norm'] = df.groupby('Item')['Pmpp'].apply(lambda x: x/ref)
Asked By: Przem

||

Answers:

I think this should work:

import pandas as pd

d = {'Item': ["dmc1", "dmc1", "dmc1", "dmc1", "dmc2", "dmc2", "dmc2", "dmc2"], 
 'Pmpp': [3, 4, 3, 1, 2, 4, 3, 1],
 'IsReference': [0, 1, 0, 0, 1, 0, 0, 0],
 'TimeStamp': ["22.02.2023", "25.02.2023", "28.02.2023", "3.03.2023", "24.02.2023", "25.02.2023", "2.03.2023", "5.03.2023"]
}

df = pd.DataFrame(data=d)

# Convert TimeStamp column to datetime format
df['TimeStamp'] = pd.to_datetime(df['TimeStamp'], format='%d.%m.%Y')

# Calculate the reference Pmpp for each Item
reference_pmpp = df.loc[df['IsReference'] == 1, ['Item', 'Pmpp']].set_index('Item')['Pmpp']

# Calculate relative Pmpp values
df['Pmpp_norm'] = df.apply(lambda x: x['Pmpp'] / reference_pmpp.loc[x['Item']], axis=1)

# Calculate the reference TimeStamp for each Item
reference_timestamp = df.loc[df['IsReference'] == 1, ['Item', 'TimeStamp']].set_index('Item')['TimeStamp']

# Calculate the difference between dates based on reference date
df['Date_diff'] = df.apply(lambda x: (x['TimeStamp'] - reference_timestamp.loc[x['Item']]).days, axis=1)

print(df)

Answered By: cconsta1

You have to broadcast the reference value to all rows:

ref = df['Pmpp'].where(df['IsReference'] == 1).groupby(df['Item']).transform('max')

df['Pmpp_norm'] = df['Pmpp'] / ref

Output:

>>> df
   Item  Pmpp  IsReference   TimeStamp  Pmpp_norm
0  dmc1     3            0  22.02.2023       0.75
1  dmc1     4            1  25.02.2023       1.00
2  dmc1     3            0  28.02.2023       0.75
3  dmc1     1            0   3.03.2023       0.25
4  dmc2     2            1  24.02.2023       1.00
5  dmc2     4            0  25.02.2023       2.00
6  dmc2     3            0   2.03.2023       1.50
7  dmc2     1            0   5.03.2023       0.50

>>> ref

Update

You can also use a mapping dict:

ref = df[df['IsReference'] == 1].set_index('Item')['Pmpp']

df['Pmpp_norm'] = df['Pmpp'] / df['Item'].map(ref)

Output:

>>> df
   Item  Pmpp  IsReference   TimeStamp  Pmpp_norm
0  dmc1     3            0  22.02.2023       0.75
1  dmc1     4            1  25.02.2023       1.00
2  dmc1     3            0  28.02.2023       0.75
3  dmc1     1            0   3.03.2023       0.25
4  dmc2     2            1  24.02.2023       1.00
5  dmc2     4            0  25.02.2023       2.00
6  dmc2     3            0   2.03.2023       1.50
7  dmc2     1            0   5.03.2023       0.50

>>> ref
Item
dmc1    4
dmc2    2
Name: Pmpp, dtype: int64
Answered By: Corralien