Replacing values in Pandas Dataframe Column with new text and ascending number, using python
Question:
Python beginner here. I have values in a Pandas Dataframe that I would like to change to a new value. Also, I then want to apply an ascending value at the end of the new text. My zone column below needs to be changed to still group by zone.
This is an example of how my Dataframe currently looks:
value
section
zone
1
red
25
2
red
25
3
grey
28
4
blue
35
5
yellow
35
6
yellow
35
7
blue
50
8
green
50
This is the changes in the ‘zone’ column I would like to make:
value
section
zone
1
red
Zone1
2
red
Zone1
3
grey
Zone2
4
blue
Zone3
5
yellow
Zone3
6
yellow
Zone3
7
blue
Zone4
8
green
Zone4
I’m not quite sure how to handle this problem. I’m assuming I need to use some sort of dataframe.replace()
. I am not too skilled with python yet, so I hope this question makes sense.
Answers:
Try:
df['zone'] = 'Zone' + (df.groupby('zone').ngroup() + 1).astype(str)
print(df)
Prints:
value section zone
0 1 red Zone1
1 2 red Zone1
2 3 grey Zone2
3 4 blue Zone3
4 5 yellow Zone3
5 6 yellow Zone3
6 7 blue Zone4
7 8 green Zone4
You can use DataFrame.rank
with the dense method to rank your zones. Here we cast to int before casting to string because otherwise the ranks are given with decimal points.
df["zone2"] = "zone" + df.zone.rank(method="dense").astype("int").astype("str")
Python beginner here. I have values in a Pandas Dataframe that I would like to change to a new value. Also, I then want to apply an ascending value at the end of the new text. My zone column below needs to be changed to still group by zone.
This is an example of how my Dataframe currently looks:
value | section | zone |
---|---|---|
1 | red | 25 |
2 | red | 25 |
3 | grey | 28 |
4 | blue | 35 |
5 | yellow | 35 |
6 | yellow | 35 |
7 | blue | 50 |
8 | green | 50 |
This is the changes in the ‘zone’ column I would like to make:
value | section | zone |
---|---|---|
1 | red | Zone1 |
2 | red | Zone1 |
3 | grey | Zone2 |
4 | blue | Zone3 |
5 | yellow | Zone3 |
6 | yellow | Zone3 |
7 | blue | Zone4 |
8 | green | Zone4 |
I’m not quite sure how to handle this problem. I’m assuming I need to use some sort of dataframe.replace()
. I am not too skilled with python yet, so I hope this question makes sense.
Try:
df['zone'] = 'Zone' + (df.groupby('zone').ngroup() + 1).astype(str)
print(df)
Prints:
value section zone
0 1 red Zone1
1 2 red Zone1
2 3 grey Zone2
3 4 blue Zone3
4 5 yellow Zone3
5 6 yellow Zone3
6 7 blue Zone4
7 8 green Zone4
You can use DataFrame.rank
with the dense method to rank your zones. Here we cast to int before casting to string because otherwise the ranks are given with decimal points.
df["zone2"] = "zone" + df.zone.rank(method="dense").astype("int").astype("str")