LEFT ON Case When in Pandas

Question

i wanted to ask that if in SQL I can do like JOIN ON CASE WHEN, is there a way to do this in Pandas?

disease = [
{"City":"CH","Case_Recorded":5300,"Recovered":2839,"Deaths":2461},
{"City":"NY","Case_Recorded":1311,"Recovered":521,"Deaths":790},
{"City":"TX","Case_Recorded":1991,"Recovered":1413,"Deaths":578},
{"City":"AT","Case_Recorded":3381,"Recovered":3112,"Deaths":269},
{"City":"TX","Case_Recorded":3991,"Recovered":2810,"Deaths":1311},
{"City":"LA","Case_Recorded":2647,"Recovered":2344,"Deaths":303},
{"City":"LA","Case_Recorded":4410,"Recovered":3344,"Deaths":1066}
]

region = {"North": ["AT"], "West":["TX","LA"]}

So what i have is 2 dummy dict and i have already converted it to become dataframe, first is the name of the cities with the case,and I’m trying to figure out which region the cities belongs to.

Region|City
North|AT
West|TX
West|LA
None|NY
None|CH

So what i thought in SQL was using left on case when, and if the result is null when join with North region then join with West region.
But if there are 15 or 30 region in some country, it’d be problems i think

Asked By: PiPio

||

Source

Answer 1

I’m sorry, I’m not exactly sure what’s your expected result, can you express more? if your expected result is just getting the city’s region there is no need for conditional joining? for ex: you can transform the city-region table into per city per region per row and direct join with the main df

disease = [
{"City":"CH","Case_Recorded":5300,"Recovered":2839,"Deaths":2461},
{"City":"NY","Case_Recorded":1311,"Recovered":521,"Deaths":790},
{"City":"TX","Case_Recorded":1991,"Recovered":1413,"Deaths":578},
{"City":"AT","Case_Recorded":3381,"Recovered":3112,"Deaths":269},
{"City":"TX","Case_Recorded":3991,"Recovered":2810,"Deaths":1311},
{"City":"LA","Case_Recorded":2647,"Recovered":2344,"Deaths":303},
{"City":"LA","Case_Recorded":4410,"Recovered":3344,"Deaths":1066}
]

region = [
{'City':'AT','Region':"North"},
{'City':'TX','Region':"West"},
{'City':'LA','Region':"West"}
]

df = pd.DataFrame(disease)
df_reg = pd.DataFrame(region)

df.merge( df_reg , on = 'City' , how = 'left' )

Answered By: humphrey chan

Answer 2

Use:

#get City without duplicates
df1 = pd.DataFrame(disease)[['City']].drop_duplicates()

#create DataFrame from region dictionary
region = {"North": ["AT"], "West":["TX","LA"]}
df2 = pd.DataFrame([(k, x) for k, v in region.items() for x in v], 
                    columns=['Region','City'])

#append not matched cities to df2
out = pd.concat([df2, df1[~df1['City'].isin(df2['City'])]])
print (out)
  Region City
0  North   AT
1   West   TX
2   West   LA
0    NaN   CH
1    NaN   NY

If order is not important:

out = df2.merge(df1, how = 'right')
print (out)
  Region City
0    NaN   CH
1    NaN   NY
2   West   TX
3  North   AT
4   West   LA

Answered By: jezrael

LEFT ON Case When in Pandas

Question:

Answers: