How to calculate relative values based on position of reference flag
Question:
I have some simple table with IsReference column defining the flag indicating the base value of Pmpp based on which relative values for all other values of Pmpp grouped by Item should be calculated, as shown in the picture below. Similarly, I could calculate the difference between dates based on reference date, etc. I would appreciate for hints how to do that in Python. Below there is a code that I started with.
Best regards
import pandas as pd
d = {'Item': ["dmc1", "dmc1", "dmc1", "dmc1", "dmc2", "dmc2", "dmc2", "dmc2"],
'Pmpp': [3, 4, 3, 1, 2, 4, 3, 1],
'IsReference': [0, 1, 0, 0, 1, 0, 0, 0],
'TimeStamp': ["22.02.2023", "25.02.2023", "28.02.2023", "3.03.2023", "24.02.2023", "25.02.2023", "2.03.2023", "5.03.2023"]
}
df = pd.DataFrame(data = d)
# find location of reference
ref = df['Pmpp'][df['IsReference'] == 1].values
# calculate relative values
df['Pmpp_norm'] = df.groupby('Item')['Pmpp'].apply(lambda x: x/ref)
Answers:
I think this should work:
import pandas as pd
d = {'Item': ["dmc1", "dmc1", "dmc1", "dmc1", "dmc2", "dmc2", "dmc2", "dmc2"],
'Pmpp': [3, 4, 3, 1, 2, 4, 3, 1],
'IsReference': [0, 1, 0, 0, 1, 0, 0, 0],
'TimeStamp': ["22.02.2023", "25.02.2023", "28.02.2023", "3.03.2023", "24.02.2023", "25.02.2023", "2.03.2023", "5.03.2023"]
}
df = pd.DataFrame(data=d)
# Convert TimeStamp column to datetime format
df['TimeStamp'] = pd.to_datetime(df['TimeStamp'], format='%d.%m.%Y')
# Calculate the reference Pmpp for each Item
reference_pmpp = df.loc[df['IsReference'] == 1, ['Item', 'Pmpp']].set_index('Item')['Pmpp']
# Calculate relative Pmpp values
df['Pmpp_norm'] = df.apply(lambda x: x['Pmpp'] / reference_pmpp.loc[x['Item']], axis=1)
# Calculate the reference TimeStamp for each Item
reference_timestamp = df.loc[df['IsReference'] == 1, ['Item', 'TimeStamp']].set_index('Item')['TimeStamp']
# Calculate the difference between dates based on reference date
df['Date_diff'] = df.apply(lambda x: (x['TimeStamp'] - reference_timestamp.loc[x['Item']]).days, axis=1)
print(df)
You have to broadcast the reference value to all rows:
ref = df['Pmpp'].where(df['IsReference'] == 1).groupby(df['Item']).transform('max')
df['Pmpp_norm'] = df['Pmpp'] / ref
Output:
>>> df
Item Pmpp IsReference TimeStamp Pmpp_norm
0 dmc1 3 0 22.02.2023 0.75
1 dmc1 4 1 25.02.2023 1.00
2 dmc1 3 0 28.02.2023 0.75
3 dmc1 1 0 3.03.2023 0.25
4 dmc2 2 1 24.02.2023 1.00
5 dmc2 4 0 25.02.2023 2.00
6 dmc2 3 0 2.03.2023 1.50
7 dmc2 1 0 5.03.2023 0.50
>>> ref
Update
You can also use a mapping dict:
ref = df[df['IsReference'] == 1].set_index('Item')['Pmpp']
df['Pmpp_norm'] = df['Pmpp'] / df['Item'].map(ref)
Output:
>>> df
Item Pmpp IsReference TimeStamp Pmpp_norm
0 dmc1 3 0 22.02.2023 0.75
1 dmc1 4 1 25.02.2023 1.00
2 dmc1 3 0 28.02.2023 0.75
3 dmc1 1 0 3.03.2023 0.25
4 dmc2 2 1 24.02.2023 1.00
5 dmc2 4 0 25.02.2023 2.00
6 dmc2 3 0 2.03.2023 1.50
7 dmc2 1 0 5.03.2023 0.50
>>> ref
Item
dmc1 4
dmc2 2
Name: Pmpp, dtype: int64
I have some simple table with IsReference column defining the flag indicating the base value of Pmpp based on which relative values for all other values of Pmpp grouped by Item should be calculated, as shown in the picture below. Similarly, I could calculate the difference between dates based on reference date, etc. I would appreciate for hints how to do that in Python. Below there is a code that I started with.
Best regards
import pandas as pd
d = {'Item': ["dmc1", "dmc1", "dmc1", "dmc1", "dmc2", "dmc2", "dmc2", "dmc2"],
'Pmpp': [3, 4, 3, 1, 2, 4, 3, 1],
'IsReference': [0, 1, 0, 0, 1, 0, 0, 0],
'TimeStamp': ["22.02.2023", "25.02.2023", "28.02.2023", "3.03.2023", "24.02.2023", "25.02.2023", "2.03.2023", "5.03.2023"]
}
df = pd.DataFrame(data = d)
# find location of reference
ref = df['Pmpp'][df['IsReference'] == 1].values
# calculate relative values
df['Pmpp_norm'] = df.groupby('Item')['Pmpp'].apply(lambda x: x/ref)
I think this should work:
import pandas as pd
d = {'Item': ["dmc1", "dmc1", "dmc1", "dmc1", "dmc2", "dmc2", "dmc2", "dmc2"],
'Pmpp': [3, 4, 3, 1, 2, 4, 3, 1],
'IsReference': [0, 1, 0, 0, 1, 0, 0, 0],
'TimeStamp': ["22.02.2023", "25.02.2023", "28.02.2023", "3.03.2023", "24.02.2023", "25.02.2023", "2.03.2023", "5.03.2023"]
}
df = pd.DataFrame(data=d)
# Convert TimeStamp column to datetime format
df['TimeStamp'] = pd.to_datetime(df['TimeStamp'], format='%d.%m.%Y')
# Calculate the reference Pmpp for each Item
reference_pmpp = df.loc[df['IsReference'] == 1, ['Item', 'Pmpp']].set_index('Item')['Pmpp']
# Calculate relative Pmpp values
df['Pmpp_norm'] = df.apply(lambda x: x['Pmpp'] / reference_pmpp.loc[x['Item']], axis=1)
# Calculate the reference TimeStamp for each Item
reference_timestamp = df.loc[df['IsReference'] == 1, ['Item', 'TimeStamp']].set_index('Item')['TimeStamp']
# Calculate the difference between dates based on reference date
df['Date_diff'] = df.apply(lambda x: (x['TimeStamp'] - reference_timestamp.loc[x['Item']]).days, axis=1)
print(df)
You have to broadcast the reference value to all rows:
ref = df['Pmpp'].where(df['IsReference'] == 1).groupby(df['Item']).transform('max')
df['Pmpp_norm'] = df['Pmpp'] / ref
Output:
>>> df
Item Pmpp IsReference TimeStamp Pmpp_norm
0 dmc1 3 0 22.02.2023 0.75
1 dmc1 4 1 25.02.2023 1.00
2 dmc1 3 0 28.02.2023 0.75
3 dmc1 1 0 3.03.2023 0.25
4 dmc2 2 1 24.02.2023 1.00
5 dmc2 4 0 25.02.2023 2.00
6 dmc2 3 0 2.03.2023 1.50
7 dmc2 1 0 5.03.2023 0.50
>>> ref
Update
You can also use a mapping dict:
ref = df[df['IsReference'] == 1].set_index('Item')['Pmpp']
df['Pmpp_norm'] = df['Pmpp'] / df['Item'].map(ref)
Output:
>>> df
Item Pmpp IsReference TimeStamp Pmpp_norm
0 dmc1 3 0 22.02.2023 0.75
1 dmc1 4 1 25.02.2023 1.00
2 dmc1 3 0 28.02.2023 0.75
3 dmc1 1 0 3.03.2023 0.25
4 dmc2 2 1 24.02.2023 1.00
5 dmc2 4 0 25.02.2023 2.00
6 dmc2 3 0 2.03.2023 1.50
7 dmc2 1 0 5.03.2023 0.50
>>> ref
Item
dmc1 4
dmc2 2
Name: Pmpp, dtype: int64