How to bin nan values using pd.cut

Question:

I am trying to write a code that creates bins from a dataframe(account_raw) that contains blank values. My problem is that python bins blank values with my first bin label: 0 – 25k. What I want ot do is to create a separate bin for blank values.Any ideas how to fix this? Thanks

Bucket = [0, 25000, 50000, 100000,
          200000, 300000, 999999999999]
Label = ['0k to 25k', '25k - 50k',
         '50k - 100k', '100k - 200k',
         '200k - 300k', 'More than 300k']

account_raw['LoanGBVBuckets'] = pd.cut(
    account_raw['IfrsBalanceEUR'],
    bins=ls_LoanGBVBucket,
    labels=ls_LoanGBVBucketLabel,
    include_lowest=True).astype(str)
Asked By: Andreas

||

Answers:

I think simpliest is processing values after pd.cut and set custom catagory for missing values by IfrsBalanceEUR column:

account_raw['LoanGBVBuckets'] = pd.cut(account_raw['IfrsBalanceEUR'],
                                      bins=ls_LoanGBVBucket, 
                                      labels=ls_LoanGBVBucketLabel, 
                                      include_lowest= True).astype(str)

account_raw.loc[account_raw['IfrsBalanceEUR'].isna(), 'LoanGBVBuckets'] = 'missing values'

EDIT:

Tested in pandas 0.25.0 and for missing values get NaNs in output, for replace them some category first is necessary cat.add_categories and then fillna:

account_raw = pd.DataFrame({'IfrsBalanceEUR':[np.nan, 100, 100000]})

Bucket = [0, 25000, 50000, 100000, 200000, 300000, 999999999999]
Label = ['0k to 25k', '25k - 50k', '50k - 100k', 
         '100k - 200k', '200k - 300k', 'More than 300k']

account_raw['LoanGBVBuckets'] = pd.cut(account_raw['IfrsBalanceEUR'],
                                      bins=Bucket, 
                                      labels=Label, 
                                      include_lowest= True)
print (account_raw)
   IfrsBalanceEUR LoanGBVBuckets
0             NaN            NaN
1           100.0      0k to 25k
2        100000.0     50k - 100k

account_raw['LoanGBVBuckets']=(account_raw['LoanGBVBuckets'].cat
                                                            .add_categories('missing values')
                                                            .fillna('missing values'))
print (account_raw)
   IfrsBalanceEUR  LoanGBVBuckets
0             NaN  missing values
1           100.0       0k to 25k
2        100000.0      50k - 100k
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.