Turn Pandas Multi-Index into column
Question:
I have a dataframe with 2 index levels:
value
Trial measurement
1 0 13
1 3
2 4
2 0 NaN
1 12
3 0 34
Which I want to turn into this:
Trial measurement value
1 0 13
1 1 3
1 2 4
2 0 NaN
2 1 12
3 0 34
How can I best do this?
I need this because I want to aggregate the data as instructed here, but I can’t select my columns like that if they are in use as indices.
Answers:
The reset_index() is a pandas DataFrame method that will transfer index values into the DataFrame as columns. The default setting for the parameter is drop=False (which will keep the index values as columns).
All you have to do call .reset_index()
after the name of the DataFrame:
df = df.reset_index()
This doesn’t really apply to your case but could be helpful for others (like myself 5 minutes ago) to know. If one’s multindex have the same name like this:
value
Trial Trial
1 0 13
1 3
2 4
2 0 NaN
1 12
3 0 34
df.reset_index(inplace=True)
will fail, cause the columns that are created cannot have the same names.
So then you need to rename the multindex with df.index = df.index.set_names(['Trial', 'measurement'])
to get:
value
Trial measurement
1 0 13
1 1 3
1 2 4
2 0 NaN
2 1 12
3 0 34
And then df.reset_index(inplace=True)
will work like a charm.
I encountered this problem after grouping by year and month on a datetime-column(not index) called live_date
, which meant that both year and month were named live_date
.
As @cs95 mentioned in a comment, to drop only one level, use:
df.reset_index(level=[...])
This avoids having to redefine your desired index after reset.
There may be situations when df.reset_index()
cannot be used (e.g., when you need the index, too). In this case, use index.get_level_values()
to access index values directly:
df['Trial'] = df.index.get_level_values(0)
df['measurement'] = df.index.get_level_values(1)
This will assign index values to individual columns and keep the index.
See the docs for further info.
Short and simple
df2 = pd.DataFrame({'test_col': df['test_col'].describe()})
df2 = df2.reset_index()
Similar to Alex solution in a more generalized form. It keeps the indexes untouched and adds index level as a new columns with its name.
for i in df.index.names:
df[i] = df.index.get_level_values(i)
which gives
value Trial measurement
Trial measurement
1 0 13 1 0
1 3 1 1
2 4 1 2
...
A solution that might be helpful in cases when not every column has multiple index levels:
df.columns = df.columns.map(''.join)
I have a dataframe with 2 index levels:
value
Trial measurement
1 0 13
1 3
2 4
2 0 NaN
1 12
3 0 34
Which I want to turn into this:
Trial measurement value
1 0 13
1 1 3
1 2 4
2 0 NaN
2 1 12
3 0 34
How can I best do this?
I need this because I want to aggregate the data as instructed here, but I can’t select my columns like that if they are in use as indices.
The reset_index() is a pandas DataFrame method that will transfer index values into the DataFrame as columns. The default setting for the parameter is drop=False (which will keep the index values as columns).
All you have to do call .reset_index()
after the name of the DataFrame:
df = df.reset_index()
This doesn’t really apply to your case but could be helpful for others (like myself 5 minutes ago) to know. If one’s multindex have the same name like this:
value
Trial Trial
1 0 13
1 3
2 4
2 0 NaN
1 12
3 0 34
df.reset_index(inplace=True)
will fail, cause the columns that are created cannot have the same names.
So then you need to rename the multindex with df.index = df.index.set_names(['Trial', 'measurement'])
to get:
value
Trial measurement
1 0 13
1 1 3
1 2 4
2 0 NaN
2 1 12
3 0 34
And then df.reset_index(inplace=True)
will work like a charm.
I encountered this problem after grouping by year and month on a datetime-column(not index) called live_date
, which meant that both year and month were named live_date
.
As @cs95 mentioned in a comment, to drop only one level, use:
df.reset_index(level=[...])
This avoids having to redefine your desired index after reset.
There may be situations when df.reset_index()
cannot be used (e.g., when you need the index, too). In this case, use index.get_level_values()
to access index values directly:
df['Trial'] = df.index.get_level_values(0)
df['measurement'] = df.index.get_level_values(1)
This will assign index values to individual columns and keep the index.
See the docs for further info.
Short and simple
df2 = pd.DataFrame({'test_col': df['test_col'].describe()})
df2 = df2.reset_index()
Similar to Alex solution in a more generalized form. It keeps the indexes untouched and adds index level as a new columns with its name.
for i in df.index.names:
df[i] = df.index.get_level_values(i)
which gives
value Trial measurement
Trial measurement
1 0 13 1 0
1 3 1 1
2 4 1 2
...
A solution that might be helpful in cases when not every column has multiple index levels:
df.columns = df.columns.map(''.join)