LEFT ON Case When in Pandas

Question:

i wanted to ask that if in SQL I can do like JOIN ON CASE WHEN, is there a way to do this in Pandas?

disease = [
{"City":"CH","Case_Recorded":5300,"Recovered":2839,"Deaths":2461},
{"City":"NY","Case_Recorded":1311,"Recovered":521,"Deaths":790},
{"City":"TX","Case_Recorded":1991,"Recovered":1413,"Deaths":578},
{"City":"AT","Case_Recorded":3381,"Recovered":3112,"Deaths":269},
{"City":"TX","Case_Recorded":3991,"Recovered":2810,"Deaths":1311},
{"City":"LA","Case_Recorded":2647,"Recovered":2344,"Deaths":303},
{"City":"LA","Case_Recorded":4410,"Recovered":3344,"Deaths":1066}
]

region = {"North": ["AT"], "West":["TX","LA"]}

So what i have is 2 dummy dict and i have already converted it to become dataframe, first is the name of the cities with the case,and I’m trying to figure out which region the cities belongs to.

Region|City
North|AT
West|TX
West|LA
None|NY
None|CH

So what i thought in SQL was using left on case when, and if the result is null when join with North region then join with West region.
But if there are 15 or 30 region in some country, it’d be problems i think

Asked By: PiPio

||

Answers:

I’m sorry, I’m not exactly sure what’s your expected result, can you express more? if your expected result is just getting the city’s region there is no need for conditional joining? for ex: you can transform the city-region table into per city per region per row and direct join with the main df

disease = [
{"City":"CH","Case_Recorded":5300,"Recovered":2839,"Deaths":2461},
{"City":"NY","Case_Recorded":1311,"Recovered":521,"Deaths":790},
{"City":"TX","Case_Recorded":1991,"Recovered":1413,"Deaths":578},
{"City":"AT","Case_Recorded":3381,"Recovered":3112,"Deaths":269},
{"City":"TX","Case_Recorded":3991,"Recovered":2810,"Deaths":1311},
{"City":"LA","Case_Recorded":2647,"Recovered":2344,"Deaths":303},
{"City":"LA","Case_Recorded":4410,"Recovered":3344,"Deaths":1066}
]

region = [
{'City':'AT','Region':"North"},
{'City':'TX','Region':"West"},
{'City':'LA','Region':"West"}
]

df = pd.DataFrame(disease)
df_reg = pd.DataFrame(region)

df.merge( df_reg , on = 'City' , how = 'left' )
Answered By: humphrey chan

Use:

#get City without duplicates
df1 = pd.DataFrame(disease)[['City']].drop_duplicates()

#create DataFrame from region dictionary
region = {"North": ["AT"], "West":["TX","LA"]}
df2 = pd.DataFrame([(k, x) for k, v in region.items() for x in v], 
                    columns=['Region','City'])

#append not matched cities to df2
out = pd.concat([df2, df1[~df1['City'].isin(df2['City'])]])
print (out)
  Region City
0  North   AT
1   West   TX
2   West   LA
0    NaN   CH
1    NaN   NY

If order is not important:

out = df2.merge(df1, how = 'right')
print (out)
  Region City
0    NaN   CH
1    NaN   NY
2   West   TX
3  North   AT
4   West   LA
Answered By: jezrael
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.