# How to calculate relative values based on position of reference flag

## Question:

I have some simple table with IsReference column defining the flag indicating the base value of Pmpp based on which relative values for all other values of Pmpp grouped by Item should be calculated, as shown in the picture below. Similarly, I could calculate the difference between dates based on reference date, etc. I would appreciate for hints how to do that in Python. Below there is a code that I started with.
Best regards

``````import pandas as pd

d = {'Item': ["dmc1", "dmc1", "dmc1", "dmc1", "dmc2", "dmc2", "dmc2", "dmc2"],
'Pmpp': [3, 4, 3, 1, 2, 4, 3, 1],
'IsReference': [0, 1, 0, 0, 1, 0, 0, 0],
'TimeStamp': ["22.02.2023", "25.02.2023", "28.02.2023", "3.03.2023", "24.02.2023", "25.02.2023", "2.03.2023", "5.03.2023"]
}

df = pd.DataFrame(data = d)

# find location of reference
ref = df['Pmpp'][df['IsReference'] == 1].values

# calculate relative values
df['Pmpp_norm'] = df.groupby('Item')['Pmpp'].apply(lambda x: x/ref)
``````

I think this should work:

``````import pandas as pd

d = {'Item': ["dmc1", "dmc1", "dmc1", "dmc1", "dmc2", "dmc2", "dmc2", "dmc2"],
'Pmpp': [3, 4, 3, 1, 2, 4, 3, 1],
'IsReference': [0, 1, 0, 0, 1, 0, 0, 0],
'TimeStamp': ["22.02.2023", "25.02.2023", "28.02.2023", "3.03.2023", "24.02.2023", "25.02.2023", "2.03.2023", "5.03.2023"]
}

df = pd.DataFrame(data=d)

# Convert TimeStamp column to datetime format
df['TimeStamp'] = pd.to_datetime(df['TimeStamp'], format='%d.%m.%Y')

# Calculate the reference Pmpp for each Item
reference_pmpp = df.loc[df['IsReference'] == 1, ['Item', 'Pmpp']].set_index('Item')['Pmpp']

# Calculate relative Pmpp values
df['Pmpp_norm'] = df.apply(lambda x: x['Pmpp'] / reference_pmpp.loc[x['Item']], axis=1)

# Calculate the reference TimeStamp for each Item
reference_timestamp = df.loc[df['IsReference'] == 1, ['Item', 'TimeStamp']].set_index('Item')['TimeStamp']

# Calculate the difference between dates based on reference date
df['Date_diff'] = df.apply(lambda x: (x['TimeStamp'] - reference_timestamp.loc[x['Item']]).days, axis=1)

print(df)

``````

You have to broadcast the reference value to all rows:

``````ref = df['Pmpp'].where(df['IsReference'] == 1).groupby(df['Item']).transform('max')

df['Pmpp_norm'] = df['Pmpp'] / ref
``````

Output:

``````>>> df
Item  Pmpp  IsReference   TimeStamp  Pmpp_norm
0  dmc1     3            0  22.02.2023       0.75
1  dmc1     4            1  25.02.2023       1.00
2  dmc1     3            0  28.02.2023       0.75
3  dmc1     1            0   3.03.2023       0.25
4  dmc2     2            1  24.02.2023       1.00
5  dmc2     4            0  25.02.2023       2.00
6  dmc2     3            0   2.03.2023       1.50
7  dmc2     1            0   5.03.2023       0.50

>>> ref

``````

Update

You can also use a mapping dict:

``````ref = df[df['IsReference'] == 1].set_index('Item')['Pmpp']

df['Pmpp_norm'] = df['Pmpp'] / df['Item'].map(ref)
``````

Output:

``````>>> df
Item  Pmpp  IsReference   TimeStamp  Pmpp_norm
0  dmc1     3            0  22.02.2023       0.75
1  dmc1     4            1  25.02.2023       1.00
2  dmc1     3            0  28.02.2023       0.75
3  dmc1     1            0   3.03.2023       0.25
4  dmc2     2            1  24.02.2023       1.00
5  dmc2     4            0  25.02.2023       2.00
6  dmc2     3            0   2.03.2023       1.50
7  dmc2     1            0   5.03.2023       0.50

>>> ref
Item
dmc1    4
dmc2    2
Name: Pmpp, dtype: int64
``````