How does pandas.qcut deal with remainder values?

Question:

game_num = range(1,102,1)
player_name = ['Fred']*101

dict = {'name':player_name,'game_num':game_num}
df = pd.DataFrame(dict)

df['percentile_bin'] = pd.qcut(df['game_num'],100,list(range(1,101)))
  • Problem

If I enter df.percentile_bin.nunique() I get 98 which indicates that 2 percentile bins are not populated.

You can see for instance below, that game_num 2 is allocated to the 1st percentile_bin along with game_num 1. Why is this?

I would have expected pd.qcut(100,list(range(1,101))) to allocate 100 percentile bins to this dataframe, each populated by 1 row, with exactly 1 extra (because there was 101 rows).

  • Desired Output

I really need to 100 bins (percentiles) rather than 98. I don’t necessarily have 100 values per individual. Some individuals have 100 values (the minimum), others have 100000s. I would like to reduce each of these individuals values to 100 percentiles which represents their performance over a year in 100 "chunks" or percentiles. (it is hard to specify bin size because I do not know how many total values a given individual may have: 100 or 1000000?)

enter image description here

Asked By: TunaFishLies

||

Answers:

It’s because of the rounding error of IEEE 754 floating-point numbers.

This can be seen in the returned bins of the pandas.qcut().

cats, bins = pd.qcut(range(1,102,1), 100, retbins=True)
for e in bins:
    print(e)

This will output the following.

...
28.0
29.000000000000004
29.999999999999996
31.0
...
54.0
56.00000000000001
57.00000000000001
58.00000000000001
58.99999999999999
60.0
...

So, the categories(intervals) (29.000000000000004,29.999999999999996] and (58.00000000000001, 58.99999999999999] will not be in the returned categorical data.

If you want just 100 intervals, you can use pandas.cut() like this.

cats = pd.cut(range(1, 102), 100)
Answered By: relent95
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.