# Which regions have the lowest literacy rates?

## Question:

I need to group/merge each city of the same name and calculate its overall percentage, to see which city amongst them has the lowest % literacy rate.
Code:

``````Python
import pandas as pd
df = pd.DataFrame({'Cities': ["Cape Town", "Cape Town", "Cape Town", "Tokyo", "Cape Town", "Tokyo", "Mumbai", "Belgium", "Belgium" ],
'LiteracyRate': [0.05, 0.35, 0.2, 0.11, 0.15, 0.2, 0.65, 0.35, 0.45]})

print(df)
``````

For example:

``````   Cities      LiteracyRate
0  Cape Town   0.05
1  Cape Town   0.35
2  Cape Town   0.2
3  Tokyo       0.11
4  Cape Town   0.15
5  Tokyo       0.2
6  Mumbai      0.65
7  Belgium     0.35
8  Belgium     0.45
``````

I’m expecting this:

``````   Cities     LiteracyRate   %LiteracyRate
0  Cape Town  0.75            75
1  Tokyo      0.31            31
2  Mumbai     0.65            65
3  Belgium    0.8             80
``````

So I tried this code below but it’s not giving me desirable results, the countries with similar names are still not merged. And the percentages ain’t right.

``````# Calculate the percentage
df["%LiteracyRate"] = (df["LiteracyRate"]/df["LiteracyRate"].sum())*100

# Show the DataFrame
print(df)
``````

You can use groupby() in pandas, to join cities with the same names and sum() to calculate %

``````df = df.groupby('Cities').sum()
``````

Than you can format results using

``````df['%LiteracyRate'] = (df['LiteracyRate']*100).round().astype(int)

df = df.reset_index()
``````

To sort them by literacy rate you can

``````df = df.sort_values(by='%LiteracyRate')

df = df.reset_index()
``````

Hope this helps!