how to create a new column based on string from different columns

Question:

I have a dataframe look like this:

type city
1 dki jakarta
2 jawa barat
3 jawa tengah
4 jawa timur
5 sulawesi

I want to create a new column called city_group based on the city.

  • dki jakarta, jawa barat: jabo, jabar
  • jawa tengah, jawa tengah: jateng, jatim
  • sulawesi: others

The desire dataframe would be like this:

type city city_group
1 dki jakarta jabo, jabar
2 jawa barat jabo, jabar
3 jawa tengah jateng, jatim
4 jawa timur jateng, jatim
5 sulawesi others

So far, what I have done is with this script below but I did not get how to put multiple string in the condition.

df.loc[df['city'].str.contains("dki jakarta),'city_group'] = 'jabo, jabar'

How can I get the desired dataframe with pandas?
Thank you in advance

Asked By: yangyang

||

Answers:

Create dictionary of lists for mapping by keys and in dict comprehenion flatten it, so possible use Series.map with Series.fillna – all values not matched in dictionary create others values in column city_group:

d = {'jabo, jabar':['dki jakarta','jawa barat'],
     'jateng, jatim':['jawa tengah','jawa timur']}

d1 = {x: k for k, v in d.items() for x in v}
df['city_group'] = df['city'].map(d1).fillna('others')
print (df)
   type         city     city_group
0     1  dki jakarta    jabo, jabar
1     2   jawa barat    jabo, jabar
2     3  jawa tengah  jateng, jatim
3     4   jawa timur  jateng, jatim
4     5     sulawesi         others
Answered By: jezrael