Pandas Column not found after doing an aggregation function

Question:

I have an aggregation function which totals rows in a certain column based on an ID. After being able to correctly aggregate my rows, I wanted to select only the relevant columns, but I keep getting an error saying my ID column isn’t found.

Full Code:

import pandas as pd
  
# initialize list of lists
data = [['A29', 112, 10, 0.3], ['A29',112, 15, 0.1], ['A29', 112, 14, 0.22], ['A29', 88, 33, 0.09], ['A29', 88, 29, 0.1], ['A29', 88, 6, 0.2]]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Id', 'Cores', 'Provisioning', 'Utilization'])

df['total'] = df['Provisioning'] * df['Utilization']

df=df[['Id', 'Cores','total']]
aggregation_functions = {'Cores': 'first', 'total': 'sum'}
df_new = df.groupby(df['Id']).aggregate(aggregation_functions)

df_new['total1']=df_new['total']/3
print(df_new) #the dataframe contains the Id columns
print(df_new.columns) #doesn't print Id column

df_new=df_new[['Id', 'total1']] #Error: Id column not found

I’m not sure what is happening here. A line above, I print the dataframe and the Id column is present. However, when I try selecting it, it returns an error saying it isn’t found. How can I fix this issue?

Asked By: ReactNewbie123

||

Answers:

You should use as_index=False in the call to .groupby(); the Id column is part of the index, which prevents you from selecting it in the desired manner:

df_new = df.groupby(df['Id'], as_index=False).aggregate(aggregation_functions)
Answered By: BrokenBenchmark
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.