Renaming columns using numbers from a range in python/pandas
Question:
I’m stuck with the following situation. I’m pretty sure I’m missing something simple, but I tried a lot of suggestions here and at other sites, and haven’t found what I’m looking for.
I have a dataframe with a lot of randomly named columns (courtesy of provided csv file). I would like to rename these columns using digits from the range function.
Since I’m renaming all columns, I could do it directly using
df.columns = [str(x) for x in range(1,2000)]
However, hypothetically, could I do it through the rename() function? Maybe using a lambda? I have tried many different variations, but I’m getting all sorts of errors.
I’m looking for the syntax to give me the equivalent of
df.rename(columns= (str(x) for x in range(1,2000)))
where rename assigns the name to the columns sequentially based on the given range.
The above does’t work. But is there a way to make it work?
Thank you!
Answers:
You can pass a dict
to rename’s columns
kwarg:
df.rename(columns={x:y for x,y in zip(df.columns,range(0,len(df.columns)))})
That will take:
>>> df
ID1 ID2 POS1 POS2 TYPE TYPEVAL
1 A 001 1 5 COLOR RED
2 A 001 1 5 WEIGHT 50KG
3 A 001 1 5 HEIGHT 160CM
4 A 002 6 19 FUTURE YES
5 A 002 6 19 PRESENT NO
6 B 001 26 34 COLOUR BLUE
7 B 001 26 34 WEIGHT 85KG
8 B 001 26 34 HEIGHT 120CM
9 C 001 10 13 MOBILE NOKIA
10 C 001 10 13 TABLET ASUS
And give you:
>>> df.rename(columns={x:y for x,y in zip(df.columns,range(0,len(df.columns)))})
0 1 2 3 4 5
1 A 001 1 5 COLOR RED
2 A 001 1 5 WEIGHT 50KG
3 A 001 1 5 HEIGHT 160CM
4 A 002 6 19 FUTURE YES
5 A 002 6 19 PRESENT NO
6 B 001 26 34 COLOUR BLUE
7 B 001 26 34 WEIGHT 85KG
8 B 001 26 34 HEIGHT 120CM
9 C 001 10 13 MOBILE NOKIA
10 C 001 10 13 TABLET ASUS
If you just want to rename the columns using numbers, this is probably the easiest way to do it:
df.columns = np.arange(len(df.columns))
#-- or --
df.columns = [x for x in range(0, len(df.columns))]
Demo:
df = pd.DataFrame({'A':['a', 'b', 'c'], 'B': ['d','e','f'], 'C': ['g','h','i']})
print(df)
A B C
0 a d g
1 b e h
2 c f i
Renaming the columns:
df.columns = np.arange(len(df.columns))
print(df)
0 1 2
0 a d g
1 b e h
2 c f i
df.columns = range(len(df.columns))
I’m stuck with the following situation. I’m pretty sure I’m missing something simple, but I tried a lot of suggestions here and at other sites, and haven’t found what I’m looking for.
I have a dataframe with a lot of randomly named columns (courtesy of provided csv file). I would like to rename these columns using digits from the range function.
Since I’m renaming all columns, I could do it directly using
df.columns = [str(x) for x in range(1,2000)]
However, hypothetically, could I do it through the rename() function? Maybe using a lambda? I have tried many different variations, but I’m getting all sorts of errors.
I’m looking for the syntax to give me the equivalent of
df.rename(columns= (str(x) for x in range(1,2000)))
where rename assigns the name to the columns sequentially based on the given range.
The above does’t work. But is there a way to make it work?
Thank you!
You can pass a dict
to rename’s columns
kwarg:
df.rename(columns={x:y for x,y in zip(df.columns,range(0,len(df.columns)))})
That will take:
>>> df ID1 ID2 POS1 POS2 TYPE TYPEVAL 1 A 001 1 5 COLOR RED 2 A 001 1 5 WEIGHT 50KG 3 A 001 1 5 HEIGHT 160CM 4 A 002 6 19 FUTURE YES 5 A 002 6 19 PRESENT NO 6 B 001 26 34 COLOUR BLUE 7 B 001 26 34 WEIGHT 85KG 8 B 001 26 34 HEIGHT 120CM 9 C 001 10 13 MOBILE NOKIA 10 C 001 10 13 TABLET ASUS
And give you:
>>> df.rename(columns={x:y for x,y in zip(df.columns,range(0,len(df.columns)))}) 0 1 2 3 4 5 1 A 001 1 5 COLOR RED 2 A 001 1 5 WEIGHT 50KG 3 A 001 1 5 HEIGHT 160CM 4 A 002 6 19 FUTURE YES 5 A 002 6 19 PRESENT NO 6 B 001 26 34 COLOUR BLUE 7 B 001 26 34 WEIGHT 85KG 8 B 001 26 34 HEIGHT 120CM 9 C 001 10 13 MOBILE NOKIA 10 C 001 10 13 TABLET ASUS
If you just want to rename the columns using numbers, this is probably the easiest way to do it:
df.columns = np.arange(len(df.columns))
#-- or --
df.columns = [x for x in range(0, len(df.columns))]
Demo:
df = pd.DataFrame({'A':['a', 'b', 'c'], 'B': ['d','e','f'], 'C': ['g','h','i']})
print(df)
A B C
0 a d g
1 b e h
2 c f i
Renaming the columns:
df.columns = np.arange(len(df.columns))
print(df)
0 1 2
0 a d g
1 b e h
2 c f i
df.columns = range(len(df.columns))