how to create a new column based on string from different columns
Question:
I have a dataframe look like this:
type
city
1
dki jakarta
2
jawa barat
3
jawa tengah
4
jawa timur
5
sulawesi
I want to create a new column called city_group
based on the city.
- dki jakarta, jawa barat: jabo, jabar
- jawa tengah, jawa tengah: jateng, jatim
- sulawesi: others
The desire dataframe would be like this:
type
city
city_group
1
dki jakarta
jabo, jabar
2
jawa barat
jabo, jabar
3
jawa tengah
jateng, jatim
4
jawa timur
jateng, jatim
5
sulawesi
others
So far, what I have done is with this script below but I did not get how to put multiple string in the condition.
df.loc[df['city'].str.contains("dki jakarta),'city_group'] = 'jabo, jabar'
How can I get the desired dataframe with pandas?
Thank you in advance
Answers:
Create dictionary of lists for mapping by keys and in dict comprehenion flatten it, so possible use Series.map
with Series.fillna
– all values not matched in dictionary create others
values in column city_group
:
d = {'jabo, jabar':['dki jakarta','jawa barat'],
'jateng, jatim':['jawa tengah','jawa timur']}
d1 = {x: k for k, v in d.items() for x in v}
df['city_group'] = df['city'].map(d1).fillna('others')
print (df)
type city city_group
0 1 dki jakarta jabo, jabar
1 2 jawa barat jabo, jabar
2 3 jawa tengah jateng, jatim
3 4 jawa timur jateng, jatim
4 5 sulawesi others
I have a dataframe look like this:
type | city |
---|---|
1 | dki jakarta |
2 | jawa barat |
3 | jawa tengah |
4 | jawa timur |
5 | sulawesi |
I want to create a new column called city_group
based on the city.
- dki jakarta, jawa barat: jabo, jabar
- jawa tengah, jawa tengah: jateng, jatim
- sulawesi: others
The desire dataframe would be like this:
type | city | city_group |
---|---|---|
1 | dki jakarta | jabo, jabar |
2 | jawa barat | jabo, jabar |
3 | jawa tengah | jateng, jatim |
4 | jawa timur | jateng, jatim |
5 | sulawesi | others |
So far, what I have done is with this script below but I did not get how to put multiple string in the condition.
df.loc[df['city'].str.contains("dki jakarta),'city_group'] = 'jabo, jabar'
How can I get the desired dataframe with pandas?
Thank you in advance
Create dictionary of lists for mapping by keys and in dict comprehenion flatten it, so possible use Series.map
with Series.fillna
– all values not matched in dictionary create others
values in column city_group
:
d = {'jabo, jabar':['dki jakarta','jawa barat'],
'jateng, jatim':['jawa tengah','jawa timur']}
d1 = {x: k for k, v in d.items() for x in v}
df['city_group'] = df['city'].map(d1).fillna('others')
print (df)
type city city_group
0 1 dki jakarta jabo, jabar
1 2 jawa barat jabo, jabar
2 3 jawa tengah jateng, jatim
3 4 jawa timur jateng, jatim
4 5 sulawesi others