pandas add index to column name
Question:
I have data frame with about 100 columns that repeat itself because the data is organized by weeks, it looks something like that:
hours
hours
clicks
clicks
days
days
minutes
minutes
week 1
week 2
week 1
week 2
week 1
week 2
week 1
week 2
2
2
2
3
6
2
2
3
1
7
6
3
8
2
9
3
I would like the output to look like this:
hours_w1
hours_w2
clicks_w1
clicks_w2
days_w1
days_w2
minutes_w1
minutes_w2
2
2
2
3
6
2
2
3
1
7
6
3
8
2
9
3
I know I can just rename the columns but because I have over 100 columns I’m looking for a more efficient way.
I tried to use add_suffix but had only managed to add the same suffix to all columns, when what I need is a different index for each week.
any idea how to do this?
Thanks!!
Answers:
first you should change the first row:
df.iloc[0] = df.iloc[0].apply(lambda x:'w1' if x == 'week 1' else 'w2')
Then you can merge it with the column name like this:
df.columns = [i + '_' + j for i, j in zip(df.columns, df.iloc[0])]
And then you can remove the first row:
df = df.iloc[1:]
Extract the suffixes from the first row then add them to the column names and finally remove the first row.
# To fix mangle_dup_cols
df.columns = df.columns.str.split('.').str[0]
suffixes = '_' + df.iloc[0].str[0] + df.iloc[0].str[-1]
df.columns += suffixes
df = df.iloc[1:]
Output:
>>> df
hours_w1 hours_w2 clicks_w1 clicks_w2 days_w1 days_w2 minutes_w1 minutes_w2
1 2 2 2 3 6 2 2 3
2 1 7 6 3 8 2 9 3
I have data frame with about 100 columns that repeat itself because the data is organized by weeks, it looks something like that:
hours | hours | clicks | clicks | days | days | minutes | minutes |
---|---|---|---|---|---|---|---|
week 1 | week 2 | week 1 | week 2 | week 1 | week 2 | week 1 | week 2 |
2 | 2 | 2 | 3 | 6 | 2 | 2 | 3 |
1 | 7 | 6 | 3 | 8 | 2 | 9 | 3 |
I would like the output to look like this:
hours_w1 | hours_w2 | clicks_w1 | clicks_w2 | days_w1 | days_w2 | minutes_w1 | minutes_w2 |
---|---|---|---|---|---|---|---|
2 | 2 | 2 | 3 | 6 | 2 | 2 | 3 |
1 | 7 | 6 | 3 | 8 | 2 | 9 | 3 |
I know I can just rename the columns but because I have over 100 columns I’m looking for a more efficient way.
I tried to use add_suffix but had only managed to add the same suffix to all columns, when what I need is a different index for each week.
any idea how to do this?
Thanks!!
first you should change the first row:
df.iloc[0] = df.iloc[0].apply(lambda x:'w1' if x == 'week 1' else 'w2')
Then you can merge it with the column name like this:
df.columns = [i + '_' + j for i, j in zip(df.columns, df.iloc[0])]
And then you can remove the first row:
df = df.iloc[1:]
Extract the suffixes from the first row then add them to the column names and finally remove the first row.
# To fix mangle_dup_cols
df.columns = df.columns.str.split('.').str[0]
suffixes = '_' + df.iloc[0].str[0] + df.iloc[0].str[-1]
df.columns += suffixes
df = df.iloc[1:]
Output:
>>> df
hours_w1 hours_w2 clicks_w1 clicks_w2 days_w1 days_w2 minutes_w1 minutes_w2
1 2 2 2 3 6 2 2 3
2 1 7 6 3 8 2 9 3