How to make first row turn into second level MultiIndex
Question:
I have an existing DataFrame that looks like this:
1 | 1 | 1 | 2 | 2 | 2 | 2
--------------------------------------------------------
| abc | def | ghi | jkl | mno | pqr | stu
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
I’ve been trying this for sometime, but no success.
The repeated ones and twos are already a one level MultiIndex.
I know that if I add another level they will merge together, but having a hard time transforming that first row into the second level of the MultiIndex.
Is there a simple way of doing this?
desired output:
1 | 2
| abc | def | ghi | jkl | mno | pqr | stu
--------------------------------------------------------
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
any help would be very appreciated!
Thanks
Answers:
I think you need MultiIndex.from_arrays
and then filter out first row by DataFrame.iloc
with indexing:
df = pd.MultiIndex.from_arrays(df.columns, df.iloc[0])
df = df.iloc[1:]
Using T
and set_index
df.T.set_index(0,append=True).T
In addition to jezrael’s answer. The idea was correct, just a few changes to make it work. Thanks jezrael.
index = np.array([df.columns.values, df.iloc[0].values])
df.columns = pd.MultiIndex.from_arrays(index)
df = df.iloc[1:]
The solution proposed by Jezrael requires some corrections:
-
df.columns
and df.iloc[0]
should be together the first
argument of from_arrays
, not two separate arguments.
-
The source of the second level of MultiIndex (df.iloc[0])
should be supplemented with .values. Otherwise this MultiIndex level
inherits name (0) – the index value of row 0.
-
The resulting MultiIndex should be substituted to df.columns
,
not to the whole df
.
So the whole solution should be:
df.columns = pd.MultiIndex.from_arrays([df.columns, df.iloc[0].values])
df = df.iloc[1:]
I have an existing DataFrame that looks like this:
1 | 1 | 1 | 2 | 2 | 2 | 2
--------------------------------------------------------
| abc | def | ghi | jkl | mno | pqr | stu
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
I’ve been trying this for sometime, but no success.
The repeated ones and twos are already a one level MultiIndex.
I know that if I add another level they will merge together, but having a hard time transforming that first row into the second level of the MultiIndex.
Is there a simple way of doing this?
desired output:
1 | 2
| abc | def | ghi | jkl | mno | pqr | stu
--------------------------------------------------------
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
| 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00
any help would be very appreciated!
Thanks
I think you need MultiIndex.from_arrays
and then filter out first row by DataFrame.iloc
with indexing:
df = pd.MultiIndex.from_arrays(df.columns, df.iloc[0])
df = df.iloc[1:]
Using T
and set_index
df.T.set_index(0,append=True).T
In addition to jezrael’s answer. The idea was correct, just a few changes to make it work. Thanks jezrael.
index = np.array([df.columns.values, df.iloc[0].values])
df.columns = pd.MultiIndex.from_arrays(index)
df = df.iloc[1:]
The solution proposed by Jezrael requires some corrections:
-
df.columns
anddf.iloc[0]
should be together the first
argument offrom_arrays
, not two separate arguments. -
The source of the second level of MultiIndex (df.iloc[0])
should be supplemented with .values. Otherwise this MultiIndex level
inherits name (0) – the index value of row 0. -
The resulting MultiIndex should be substituted to
df.columns
,
not to the wholedf
.
So the whole solution should be:
df.columns = pd.MultiIndex.from_arrays([df.columns, df.iloc[0].values])
df = df.iloc[1:]