pandas: add values which were calculated after grouping to a column in the original dataframe
Question:
I have a pandas dataframe and want to add a value to a new column (‘new’) to all instances of .groupby() based on another column (‘A’).
At the moment I am doing it in several steps by:
1- looping through all unique column A values
2- calculate the value to add (run function on a different column, e.g. ‘B’)
3- store the value I would like to add to ‘new’ in a separate list (just one instance in that group!)
4- zip the list of unique groups (.groupby('A').unique()
)
5- looping again through the zipped values to store them in the dataframe.
This is a very inefficient way, and takes a long time to run.
is there a native pandas way to do it in less steps and that will run faster?
Example code:
mylist = []
df_groups = df.groupby('A')
groups = df['A'].unique()
for group in groups:
g = df_groups.get_group(group)
idxmin = g.index.min()
example = g.loc[idxmin]
mylist.append(myfunction(example['B'])
zipped = zip(groups, mylist)
df['new'] = np.nan
for group, val in zipped:
df.loc[df['A']==group, 'new'] = val
A better way to do that would be highly appreciated.
EDIT 1:
I could just run myfunction on all rows of the dataframe, but since its a heavy function, it would also take very long – so would prefer to run it as little as possible (that is, once per group).
Answers:
Please try this, if this is the ask, using min function here, you can replace it.
import pandas as pd
data = {
"calories": [400, 300, 300, 400],
"duration": [50, 40, 45, 35]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
df['min_value_duration'] = df.groupby('calories')['duration'].transform(min)
print(df)
Reference: https://www.analyticsvidhya.com/blog/2020/03/understanding-transform-function-python/
I have a pandas dataframe and want to add a value to a new column (‘new’) to all instances of .groupby() based on another column (‘A’).
At the moment I am doing it in several steps by:
1- looping through all unique column A values
2- calculate the value to add (run function on a different column, e.g. ‘B’)
3- store the value I would like to add to ‘new’ in a separate list (just one instance in that group!)
4- zip the list of unique groups (.groupby('A').unique()
)
5- looping again through the zipped values to store them in the dataframe.
This is a very inefficient way, and takes a long time to run.
is there a native pandas way to do it in less steps and that will run faster?
Example code:
mylist = []
df_groups = df.groupby('A')
groups = df['A'].unique()
for group in groups:
g = df_groups.get_group(group)
idxmin = g.index.min()
example = g.loc[idxmin]
mylist.append(myfunction(example['B'])
zipped = zip(groups, mylist)
df['new'] = np.nan
for group, val in zipped:
df.loc[df['A']==group, 'new'] = val
A better way to do that would be highly appreciated.
EDIT 1:
I could just run myfunction on all rows of the dataframe, but since its a heavy function, it would also take very long – so would prefer to run it as little as possible (that is, once per group).
Please try this, if this is the ask, using min function here, you can replace it.
import pandas as pd
data = {
"calories": [400, 300, 300, 400],
"duration": [50, 40, 45, 35]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
df['min_value_duration'] = df.groupby('calories')['duration'].transform(min)
print(df)
Reference: https://www.analyticsvidhya.com/blog/2020/03/understanding-transform-function-python/