Don't make a second level when aggregating in Pandas

Question:

Indexes and levels in Pandas still drive me nuts when using it.
My dataframe structure looks like that:

Indexstruktur   Datensatz   Workload    Selectivity     Dimensionen     Anzahl Datenpunkte  Operation   Measure     Wert

I now group by all columns except the last one ("Wert"). This one will be used to aggregate, where I calculate average and standard deviation:

mean_df = df.groupby(["Indexstruktur", "Datensatz","Workload", "Selectivity", "Dimensionen", "Anzahl Datenpunkte", "Operation", "Measure"]).agg({"Wert":['mean','std']})
mean_df.reset_index(inplace=True)

The result are two levels:

index   Indexstruktur   Datensatz   Workload    Selectivity     Dimensionen     Anzahl Datenpunkte  Operation   Measure     Wert
                                                                                                                       mean     std

How can I get rid of "Wert" and just make "avg" and "std" two columns in the same level as the rest?

mean_df.columns returns:

MultiIndex([(     'Indexstruktur',     ''),
            (         'Datensatz',     ''),
            (          'Workload',     ''),
            (       'Selectivity',     ''),
            (       'Dimensionen',     ''),
            ('Anzahl Datenpunkte',     ''),
            (         'Operation',     ''),
            (           'Measure',     ''),
            (              'Wert', 'mean'),
            (              'Wert',  'std')],
           )

I tried reset_index, droplevel, as_index=False, but nothing changes anything.
What is the solution for this?

Asked By: Hemmelig

||

Answers:

Maybe this should work for you
mean_df = df.groupby(["Indexstruktur", "Datensatz","Workload", "Selectivity", "Dimensionen", "Anzahl Datenpunkte", "Operation", "Measure"])["Wert"].agg(['mean','std']).reset_index() This would still have
"Indexstruktur", "Datensatz","Workload", "Selectivity", "Dimensionen", "Anzahl Datenpunkte", "Operation", "Measure" as columns though, but also agg and std. If you don’t want that just do
mean_df = df.groupby(["Indexstruktur", "Datensatz","Workload", "Selectivity", "Dimensionen", "Anzahl Datenpunkte", "Operation", "Measure"])["Wert"].agg(['mean','std']).reset_index(drop=True)

Answered By: Simon Hawe
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.