Seaborn bar graph after group by to create top N x label in pandas
Question:
I want to create a seaborn bar graph, problem is before creating the bar graph I need to create database based on column value(descending). For further explanation below is the data type
City
Complaint Type
Value
ARVERNE
Blocked Driveway
50
ARVERNE
Derelict Vehicle
32
ARVERNE
Disorderly Youth
2
ARVERNE
Drinking
1
ASTORIA
Animal Abuse
170
ASTORIA
Bike/Roller/Skate Chronic
16
ASTORIA
Blocked Driveway
3436
ASTORIA
Derelict Vehicle
426
BAYSIDE
Animal Abuse
53
BAYSIDE
Blocked Driveway
514
BAYSIDE
Derelict Vehicle
231
BAYSIDE
Disorderly Youth
2
BELLEROSE
Animal Abuse
15
BELLEROSE
Bike/Roller/Skate Chronic
1
BELLEROSE
Blocked Driveway
138
BELLEROSE
Derelict Vehicle
120
BREEZY POINT
Animal Abuse
2
BREEZY POINT
Blocked Driveway
3
BREEZY POINT
Derelict Vehicle
3
BREEZY POINT
Illegal Parking
16
Now I want to create graph with top 3 city with complaint and there major complain type, so if I use top 2 complain type then my data should look like the below
City
Complaint Type
Value
ASTORIA
Blocked Driveway
3436
ASTORIA
Derelict Vehicle
426
BAYSIDE
Blocked Driveway
514
BAYSIDE
Derelict Vehicle
231
BELLEROSE
Blocked Driveway
138
BELLEROSE
Derelict Vehicle
120
ARVERNE
Blocked Driveway
50
ARVERNE
Derelict Vehicle
32
BREEZY POINT
Illegal Parking
16
BREEZY POINT
Derelict Vehicle
3
Here you can clearly see that data is sorted/group by City but values are in descending order, plus only 2 major complain are selected. Can you please help on how to build this data/or plot a graph in pandas
I have tried few code where I can select the top 2 complain by city but unable to sort the city based on the values. Even when I sort the data based on values then I lose the group by. Below is the code I am currently using
df1 = df.groupby(['City','Complaint Type']).size().reset_index(name = 'size')
df2 = df1.sort_values(by = ['City', 'size'], ascending = [True, False]).groupby('City').head(3)
Answers:
To get the dataframe the way you want, you need to first get the top 3 cities. To do this, you can sort and then call unique()
on city column. Then sort the dataframe by these 3 cities (using sort_value
) and get the top 2 complaints (using groupby
and head()
). This will give you the top 6 rows. If you want them grouped by city, you can use pd.categorical()
on the data. Finally plot the graph using seaborn catplot. Code is below.
#Get top 3 cities in a list
top3=df.sort_values(by ='Value', ascending = False)['City'].unique()[0:3]
print(top3)
#Filter by these cities and pick top2 entries
df1=df[df['City'].isin(top3)].sort_values('Value', ascending = False).groupby('City').head(2)
#If you want the data grouped by city, then use categorical ordering
df1['City'] = pd.Categorical(df1['City'], top3)
df1.sort_values('City', inplace=True)
print(df1)
#Finally plot your graph
sns.catplot(data=df1, kind='bar', x='Complaint Type', y='Value', hue='City')
Note that you can exchange x and hue if you want to plot by each city in X axis
Outputs
top3
[‘ASTORIA’ ‘BAYSIDE’ ‘BELLEROSE’]
df1
City Complaint Type Value
6 ASTORIA Blocked Driveway 3436
7 ASTORIA Derelict Vehicle 426
9 BAYSIDE Blocked Driveway 514
10 BAYSIDE Derelict Vehicle 231
14 BELLEROSE Blocked Driveway 138
15 BELLEROSE Derelict Vehicle 120
Plot
I want to create a seaborn bar graph, problem is before creating the bar graph I need to create database based on column value(descending). For further explanation below is the data type
City | Complaint Type | Value |
---|---|---|
ARVERNE | Blocked Driveway | 50 |
ARVERNE | Derelict Vehicle | 32 |
ARVERNE | Disorderly Youth | 2 |
ARVERNE | Drinking | 1 |
ASTORIA | Animal Abuse | 170 |
ASTORIA | Bike/Roller/Skate Chronic | 16 |
ASTORIA | Blocked Driveway | 3436 |
ASTORIA | Derelict Vehicle | 426 |
BAYSIDE | Animal Abuse | 53 |
BAYSIDE | Blocked Driveway | 514 |
BAYSIDE | Derelict Vehicle | 231 |
BAYSIDE | Disorderly Youth | 2 |
BELLEROSE | Animal Abuse | 15 |
BELLEROSE | Bike/Roller/Skate Chronic | 1 |
BELLEROSE | Blocked Driveway | 138 |
BELLEROSE | Derelict Vehicle | 120 |
BREEZY POINT | Animal Abuse | 2 |
BREEZY POINT | Blocked Driveway | 3 |
BREEZY POINT | Derelict Vehicle | 3 |
BREEZY POINT | Illegal Parking | 16 |
Now I want to create graph with top 3 city with complaint and there major complain type, so if I use top 2 complain type then my data should look like the below
City | Complaint Type | Value |
---|---|---|
ASTORIA | Blocked Driveway | 3436 |
ASTORIA | Derelict Vehicle | 426 |
BAYSIDE | Blocked Driveway | 514 |
BAYSIDE | Derelict Vehicle | 231 |
BELLEROSE | Blocked Driveway | 138 |
BELLEROSE | Derelict Vehicle | 120 |
ARVERNE | Blocked Driveway | 50 |
ARVERNE | Derelict Vehicle | 32 |
BREEZY POINT | Illegal Parking | 16 |
BREEZY POINT | Derelict Vehicle | 3 |
Here you can clearly see that data is sorted/group by City but values are in descending order, plus only 2 major complain are selected. Can you please help on how to build this data/or plot a graph in pandas
I have tried few code where I can select the top 2 complain by city but unable to sort the city based on the values. Even when I sort the data based on values then I lose the group by. Below is the code I am currently using
df1 = df.groupby(['City','Complaint Type']).size().reset_index(name = 'size')
df2 = df1.sort_values(by = ['City', 'size'], ascending = [True, False]).groupby('City').head(3)
To get the dataframe the way you want, you need to first get the top 3 cities. To do this, you can sort and then call unique()
on city column. Then sort the dataframe by these 3 cities (using sort_value
) and get the top 2 complaints (using groupby
and head()
). This will give you the top 6 rows. If you want them grouped by city, you can use pd.categorical()
on the data. Finally plot the graph using seaborn catplot. Code is below.
#Get top 3 cities in a list
top3=df.sort_values(by ='Value', ascending = False)['City'].unique()[0:3]
print(top3)
#Filter by these cities and pick top2 entries
df1=df[df['City'].isin(top3)].sort_values('Value', ascending = False).groupby('City').head(2)
#If you want the data grouped by city, then use categorical ordering
df1['City'] = pd.Categorical(df1['City'], top3)
df1.sort_values('City', inplace=True)
print(df1)
#Finally plot your graph
sns.catplot(data=df1, kind='bar', x='Complaint Type', y='Value', hue='City')
Note that you can exchange x and hue if you want to plot by each city in X axis
Outputs
top3
[‘ASTORIA’ ‘BAYSIDE’ ‘BELLEROSE’]
df1
City Complaint Type Value
6 ASTORIA Blocked Driveway 3436
7 ASTORIA Derelict Vehicle 426
9 BAYSIDE Blocked Driveway 514
10 BAYSIDE Derelict Vehicle 231
14 BELLEROSE Blocked Driveway 138
15 BELLEROSE Derelict Vehicle 120
Plot