How to replace countries other than 'India' and 'U.S.A' by 'Other' in pandas dataframe?
Question:
I have the following df
:
df = pd.DataFrame({
'Q0_0': ["India", "Algeria", "India", "U.S.A", "Morocco", "Tunisia", "U.S.A", "France", "Russia", "Algeria"],
'Q1_1': [np.random.randint(1,100) for i in range(10)],
'Q1_2': np.random.random(10),
'Q1_3': np.random.randint(2, size=10),
'Q2_1': [np.random.randint(1,100) for i in range(10)],
'Q2_2': np.random.random(10),
'Q2_3': np.random.randint(2, size=10)
})
It has following display:
Q0_0
Q1_1
Q1_2
Q1_3
Q2_1
Q2_2
Q2_3
0
India
21
0.326856
0
51
0.520506
0
1
Algeria
7
0.504580
1
43
0.953744
1
2
India
67
0.327273
1
34
0.840453
1
3
U.S.A
49
0.056478
0
67
0.309559
1
4
Morocco
71
0.743913
1
76
0.240706
1
5
Tunisia
31
0.060707
1
78
0.576598
0
6
U.S.A
25
0.588239
1
61
0.133856
1
7
France
99
0.991723
0
85
0.274825
1
8
Russia
9
0.846950
1
61
0.279948
1
9
Algeria
79
0.176326
1
78
0.881051
1
I need to change countries other than India
and U.S.A
to Òther
in column Q0_0
.
Desired output
Q0_0 Q1_1 Q1_2 Q1_3 Q2_1 Q2_2 Q2_3
0 India 21 0.326856 0 51 0.520506 0
1 Other 7 0.504580 1 43 0.953744 1
2 India 67 0.327273 1 34 0.840453 1
3 U.S.A 49 0.056478 0 67 0.309559 1
4 Other 71 0.743913 1 76 0.240706 1
5 Other 31 0.060707 1 78 0.576598 0
6 U.S.A 25 0.588239 1 61 0.133856 1
7 Other 99 0.991723 0 85 0.274825 1
8 Other 9 0.846950 1 61 0.279948 1
9 Other 79 0.176326 1 78 0.881051 1
I tried to use pandas.series.str.replace()
but it didn’t work.
Any help from your side will be highly appreciated, thanks.
Answers:
You can use pandas.Series.mask
with pandas.Series.fillna
:
df["Q0_0"]= df["Q0_0"].mask(~df["Q0_0"].isin(["India", "U.S.A"])).fillna("Other")
# Output :
print(df)
Q0_0 Q1_1 Q1_2 Q1_3 Q2_1 Q2_2 Q2_3
0 India 43 0.681795 0 36 0.772289 0
1 Other 85 0.695352 1 14 0.989219 1
2 India 69 0.684015 1 85 0.687373 0
3 U.S.A 10 0.175235 1 52 0.825989 1
4 Other 90 0.998192 0 59 0.482667 0
5 Other 27 0.723308 0 90 0.054042 1
6 U.S.A 38 0.973819 0 69 0.536380 1
7 Other 10 0.815710 1 2 0.134707 1
8 Other 38 0.238863 1 1 0.872125 1
9 Other 96 0.078010 0 84 0.650347 0
You could use:
df['Q0_0'] = df['Q0_0'].str.replace('Algeria', 'Other')
I have the following df
:
df = pd.DataFrame({
'Q0_0': ["India", "Algeria", "India", "U.S.A", "Morocco", "Tunisia", "U.S.A", "France", "Russia", "Algeria"],
'Q1_1': [np.random.randint(1,100) for i in range(10)],
'Q1_2': np.random.random(10),
'Q1_3': np.random.randint(2, size=10),
'Q2_1': [np.random.randint(1,100) for i in range(10)],
'Q2_2': np.random.random(10),
'Q2_3': np.random.randint(2, size=10)
})
It has following display:
Q0_0 | Q1_1 | Q1_2 | Q1_3 | Q2_1 | Q2_2 | Q2_3 | |
---|---|---|---|---|---|---|---|
0 | India | 21 | 0.326856 | 0 | 51 | 0.520506 | 0 |
1 | Algeria | 7 | 0.504580 | 1 | 43 | 0.953744 | 1 |
2 | India | 67 | 0.327273 | 1 | 34 | 0.840453 | 1 |
3 | U.S.A | 49 | 0.056478 | 0 | 67 | 0.309559 | 1 |
4 | Morocco | 71 | 0.743913 | 1 | 76 | 0.240706 | 1 |
5 | Tunisia | 31 | 0.060707 | 1 | 78 | 0.576598 | 0 |
6 | U.S.A | 25 | 0.588239 | 1 | 61 | 0.133856 | 1 |
7 | France | 99 | 0.991723 | 0 | 85 | 0.274825 | 1 |
8 | Russia | 9 | 0.846950 | 1 | 61 | 0.279948 | 1 |
9 | Algeria | 79 | 0.176326 | 1 | 78 | 0.881051 | 1 |
I need to change countries other than India
and U.S.A
to Òther
in column Q0_0
.
Desired output
Q0_0 Q1_1 Q1_2 Q1_3 Q2_1 Q2_2 Q2_3
0 India 21 0.326856 0 51 0.520506 0
1 Other 7 0.504580 1 43 0.953744 1
2 India 67 0.327273 1 34 0.840453 1
3 U.S.A 49 0.056478 0 67 0.309559 1
4 Other 71 0.743913 1 76 0.240706 1
5 Other 31 0.060707 1 78 0.576598 0
6 U.S.A 25 0.588239 1 61 0.133856 1
7 Other 99 0.991723 0 85 0.274825 1
8 Other 9 0.846950 1 61 0.279948 1
9 Other 79 0.176326 1 78 0.881051 1
I tried to use pandas.series.str.replace()
but it didn’t work.
Any help from your side will be highly appreciated, thanks.
You can use pandas.Series.mask
with pandas.Series.fillna
:
df["Q0_0"]= df["Q0_0"].mask(~df["Q0_0"].isin(["India", "U.S.A"])).fillna("Other")
# Output :
print(df)
Q0_0 Q1_1 Q1_2 Q1_3 Q2_1 Q2_2 Q2_3
0 India 43 0.681795 0 36 0.772289 0
1 Other 85 0.695352 1 14 0.989219 1
2 India 69 0.684015 1 85 0.687373 0
3 U.S.A 10 0.175235 1 52 0.825989 1
4 Other 90 0.998192 0 59 0.482667 0
5 Other 27 0.723308 0 90 0.054042 1
6 U.S.A 38 0.973819 0 69 0.536380 1
7 Other 10 0.815710 1 2 0.134707 1
8 Other 38 0.238863 1 1 0.872125 1
9 Other 96 0.078010 0 84 0.650347 0
You could use:
df['Q0_0'] = df['Q0_0'].str.replace('Algeria', 'Other')