Python pandas sort_values not working properly
Question:
When I try to sort DataFrame by column value and print it white head() function it shows duplicated rows instead of desired result
regions = country_features['world_region']
happines = []
counts = []
reg = []
for region in regions:
hap = country_features.loc[country_features['world_region'] == region, 'happiness_score'].mean()
count = len(country_features[country_features['world_region'] == region])
happines.append(hap)
counts.append(count)
reg.append(region)
region_happines = pd.DataFrame({'region':reg,
'happiness_score' : happines,
'country_count':counts})
region_happines
region_happines.happiness_score = pd.to_numeric(region_happines.happiness_score)
sorted = region_happines.sort_values(by='happiness_score', ascending=False)
sorted.head(5)
I want to sort DataFrame by column value and I expected it to be sorted correctly
Answers:
First part of solution should be simplify:
print (country_features)
world_region happiness_score
0 reg1 5
1 reg1 1
2 reg2 10
3 reg2 1
4 reg2 3
region_happines = (country_features.groupby('world_region',as_index=False)
.agg(happiness_score= ('happiness_score','mean'),
country_count= ('happiness_score','size'))
.rename(columns={'world_region':'region'}))
print (region_happines)
region happiness_score country_count
0 reg1 3.000000 2
1 reg2 4.666667 3
Because in column happiness_score
are averages per groups, not converted to numeric.
out = region_happines.sort_values(by='happiness_score', ascending=False)
When I try to sort DataFrame by column value and print it white head() function it shows duplicated rows instead of desired result
regions = country_features['world_region']
happines = []
counts = []
reg = []
for region in regions:
hap = country_features.loc[country_features['world_region'] == region, 'happiness_score'].mean()
count = len(country_features[country_features['world_region'] == region])
happines.append(hap)
counts.append(count)
reg.append(region)
region_happines = pd.DataFrame({'region':reg,
'happiness_score' : happines,
'country_count':counts})
region_happines
region_happines.happiness_score = pd.to_numeric(region_happines.happiness_score)
sorted = region_happines.sort_values(by='happiness_score', ascending=False)
sorted.head(5)
I want to sort DataFrame by column value and I expected it to be sorted correctly
First part of solution should be simplify:
print (country_features)
world_region happiness_score
0 reg1 5
1 reg1 1
2 reg2 10
3 reg2 1
4 reg2 3
region_happines = (country_features.groupby('world_region',as_index=False)
.agg(happiness_score= ('happiness_score','mean'),
country_count= ('happiness_score','size'))
.rename(columns={'world_region':'region'}))
print (region_happines)
region happiness_score country_count
0 reg1 3.000000 2
1 reg2 4.666667 3
Because in column happiness_score
are averages per groups, not converted to numeric.
out = region_happines.sort_values(by='happiness_score', ascending=False)