Pandas – Replace cell values using a conditional (normalising string input for gender)
Question:
Example data
id
Gender
Age
1
F
22
2
Fem
18
3
male
45
4
She/Her
30
5
Male
25
6
Non-bianary
26
7
M
18
8
female
20
9
Male
56
I want to be able to standardise this somewhat by replacing all cells with an ‘F’ in them with ‘Female’, and all cells with ‘M’ in them with ‘Male’. I know the first step is to cast the whole column into capitals
df.Gender = df.Gender.str.capitalize()
and I know that I can do it value-by-value with
df['Gender'] = df['Gender'].replace(['F', 'Fem', 'Female'], 'Female')
but is there a way to do this somewhat programmatically?
such as
df.Gender = df.Gender.str.capitalise()
for i in df.Gender:
if 'F' in str(i):
#pd.replace call something like...
df[df.Gender == i] = 'Female'
#I know that line is very wrong
elif 'M' in str(i)...
Answers:
Try using regex:
import re
df["Gender"] = df["Gender"].str.replace(
r"^FS*$", "Female", flags=re.I, regex=True
)
print(df)
Prints:
id Gender Age
0 1 Female 22
1 2 Female 18
2 3 male 45
3 4 She/Her 30
4 5 Male 25
5 6 Non-bianary 26
6 7 M 18
7 8 Female 20
8 9 Male 56
Yes, you can loop through the df like that:
for indx, row in df.iterrows:
if row["Gender"] == "F": #Or other conditions
df.loc[index,"Gender"] = "Female"
else:
pass #or whatever condition u want to add
Is this what u asked for ?
Although its more efficient to do like that @Andrej Kesely Answered
df['Gender'][df['Gender'].isin(['F', 'Fem', 'Female'])] = 'Female'
Example data
id | Gender | Age |
---|---|---|
1 | F | 22 |
2 | Fem | 18 |
3 | male | 45 |
4 | She/Her | 30 |
5 | Male | 25 |
6 | Non-bianary | 26 |
7 | M | 18 |
8 | female | 20 |
9 | Male | 56 |
I want to be able to standardise this somewhat by replacing all cells with an ‘F’ in them with ‘Female’, and all cells with ‘M’ in them with ‘Male’. I know the first step is to cast the whole column into capitals
df.Gender = df.Gender.str.capitalize()
and I know that I can do it value-by-value with
df['Gender'] = df['Gender'].replace(['F', 'Fem', 'Female'], 'Female')
but is there a way to do this somewhat programmatically?
such as
df.Gender = df.Gender.str.capitalise()
for i in df.Gender:
if 'F' in str(i):
#pd.replace call something like...
df[df.Gender == i] = 'Female'
#I know that line is very wrong
elif 'M' in str(i)...
Try using regex:
import re
df["Gender"] = df["Gender"].str.replace(
r"^FS*$", "Female", flags=re.I, regex=True
)
print(df)
Prints:
id Gender Age
0 1 Female 22
1 2 Female 18
2 3 male 45
3 4 She/Her 30
4 5 Male 25
5 6 Non-bianary 26
6 7 M 18
7 8 Female 20
8 9 Male 56
Yes, you can loop through the df like that:
for indx, row in df.iterrows:
if row["Gender"] == "F": #Or other conditions
df.loc[index,"Gender"] = "Female"
else:
pass #or whatever condition u want to add
Is this what u asked for ?
Although its more efficient to do like that @Andrej Kesely Answered
df['Gender'][df['Gender'].isin(['F', 'Fem', 'Female'])] = 'Female'