Reshaping a dataframe every nth column
Question:
I have two datasets. After merging them horzontally, and sorting the columns with the following code, I get the dataset below:
df=
X
Y
5.2
6.5
3.3
7.6
df_year=
X
Y
2014
2014
2015
2015
df_all_cols = pd.concat([df, df_year], axis = 1)
sorted_columns = sorted(df_all_cols.columns)
df_all_cols_sort = df_all_cols[sorted_columns]
X
X
Y
Y
5.2
2014
6.5
2014
3.3
2015
7.6
2015
I am trying to make my data look like this, by stacking the dataset every 2 columns.
name
year
Variable
5.2
2014
X
3.3
2015
X
6.5
2014
Y
7.6
2015
Y
Answers:
One approach could be as follows:
0 1
0 X 5.2 2014
Y 6.5 2014
1 X 3.3 2015
Y 7.6 2015
- Next, use
df.sort_index
to sort on the original column names (i.e. "X, Y"
, now appearing as index level 1
), and get rid of index level 0
(df.droplevel
).
- Finally, use
df.reset_index
with drop=False
to insert index as a column and rename all the columns with df.rename
.
res = (pd.concat([df.stack(),df_year.stack()], axis=1)
.sort_index(level=1)
.droplevel(0)
.reset_index(drop=False)
.rename(columns={'index':'Variable',0:'name',1:'year'})
)
# change the order of cols
res = res.iloc[:, [1,2,0]]
print(res)
name year Variable
0 5.2 2014 X
1 3.3 2015 X
2 6.5 2014 Y
3 7.6 2015 Y
I have two datasets. After merging them horzontally, and sorting the columns with the following code, I get the dataset below:
df=
X | Y |
---|---|
5.2 | 6.5 |
3.3 | 7.6 |
df_year=
X | Y |
---|---|
2014 | 2014 |
2015 | 2015 |
df_all_cols = pd.concat([df, df_year], axis = 1)
sorted_columns = sorted(df_all_cols.columns)
df_all_cols_sort = df_all_cols[sorted_columns]
X | X | Y | Y |
---|---|---|---|
5.2 | 2014 | 6.5 | 2014 |
3.3 | 2015 | 7.6 | 2015 |
I am trying to make my data look like this, by stacking the dataset every 2 columns.
name | year | Variable |
---|---|---|
5.2 | 2014 | X |
3.3 | 2015 | X |
6.5 | 2014 | Y |
7.6 | 2015 | Y |
One approach could be as follows:
0 1
0 X 5.2 2014
Y 6.5 2014
1 X 3.3 2015
Y 7.6 2015
- Next, use
df.sort_index
to sort on the original column names (i.e."X, Y"
, now appearing asindex level 1
), and get rid ofindex level 0
(df.droplevel
). - Finally, use
df.reset_index
withdrop=False
to insert index as a column and rename all the columns withdf.rename
.
res = (pd.concat([df.stack(),df_year.stack()], axis=1)
.sort_index(level=1)
.droplevel(0)
.reset_index(drop=False)
.rename(columns={'index':'Variable',0:'name',1:'year'})
)
# change the order of cols
res = res.iloc[:, [1,2,0]]
print(res)
name year Variable
0 5.2 2014 X
1 3.3 2015 X
2 6.5 2014 Y
3 7.6 2015 Y