Aggregate function in pandas dataframe not working appropriately

Question:

I’m trying to sum a certain column based on a groupby of another column, I have the code right, but the output is wildly different. So I tried a simply min() function on that groupby, the output from this is also completely different from the expected output, did I do something wrong by chance?

Below are the images of the df displayed. I grouped it by lga_desc, and when tested for minimum value from those rows, I get the wrong output

|Taxable Income |lga_desc|

|300,000,450    |Alpine  |

|240,000        |Alpine  |

|700,000        |Alpine  |

|260,000,450    |Ararat  |

|469,000        |Ararat  |

|5,200,000      |Ararat  |


df = df.groupby('lga_desc')
df = df['Taxable income'].min()

output when applying min function:

lga_desc

Alpine           700,000 

Ararat           469,000 

these are the wrong outputs, from the given dataframe

thank you for the help!

Update: After careful checking on my code again, apparently when I imported this file, all numbers became strings. So a lesson, don’t forget to make sure your numbers are actual numbers! not strings 🙂

Asked By: LeoniA29

||

Answers:

You need to convert the data type to int first:

df['Taxable Income'] = df['Taxable Income'].str.replace(',', '').astype(int)
result = df.groupby('lga_desc')['Taxable Income'].min().reset_index()

OUTPUT:

  lga_desc  Taxable Income
0  Alpine            240000
1  Ararat            469000
Answered By: Nk03
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.