Symmetric number of bins in qcut around zero

Question:

I have a pandas dataframe with different number of integers and NaNs in each row. I would like to allocate values in each row into 8 bins – 4 bins for negative values and 4 bins for positive values per row. So, there will be different number of values in each bin per row. Any hints on how to adjust qcut function for that? Thanks!

Asked By: Ekaterina

||

Answers:

If I understand correctly, you could just do a qcut on positive values and a qcut on negative values.

For example, given the dataframe:

>>> df
        vals
0  -0.456460
1   0.448368
2   0.186750
3   1.056617
4  -0.035620
5  -0.609843
6   0.126376
7   0.160817
8  -1.495441
9   0.730763
10 -0.005071
11  0.677918
12 -0.779553
13  0.717374
14  2.250258
15 -0.801028
16  0.306408
17  0.538970
18 -2.120528
19  1.066903

Use 2 qcuts, one for positive and one for negative.

df.loc[df.vals > 0,'bin'] = pd.qcut(df.loc[df.vals > 0,'vals'], q=4)

df.loc[df.vals < 0,'bin'] = pd.qcut(df.loc[df.vals < 0,'vals'], q=4)

And as a result, they are binned into 8 unique bins, 4 for positive and 4 for negative:

>>> df
        vals                 bin
0  -0.456460    (-0.695, -0.351]
1   0.448368      (0.276, 0.608]
2   0.186750      (0.125, 0.276]
3   1.056617       (0.812, 2.25]
4  -0.035620  (-0.351, -0.00507]
5  -0.609843    (-0.695, -0.351]
6   0.126376      (0.125, 0.276]
7   0.160817      (0.125, 0.276]
8  -1.495441    (-2.122, -0.975]
9   0.730763      (0.608, 0.812]
10 -0.005071  (-0.351, -0.00507]
11  0.677918      (0.608, 0.812]
12 -0.779553    (-0.975, -0.695]
13  0.717374      (0.608, 0.812]
14  2.250258       (0.812, 2.25]
15 -0.801028    (-0.975, -0.695]
16  0.306408      (0.276, 0.608]
17  0.538970      (0.276, 0.608]
18 -2.120528    (-2.122, -0.975]
19  1.066903       (0.812, 2.25]

You can sort the bins to visualize them like this, allowing you to see 4 bins for positive values and 4 bins for negative values:

np.sort(df['bin'].unique())

array([Interval(-2.1219999999999999, -0.97499999999999998, closed='right'),
       Interval(-0.97499999999999998, -0.69499999999999995, closed='right'),
       Interval(-0.69499999999999995, -0.35099999999999998, closed='right'),
       Interval(-0.35099999999999998, -0.0050699999999999999, closed='right'),
       Interval(0.125, 0.27600000000000002, closed='right'),
       Interval(0.27600000000000002, 0.60799999999999998, closed='right'),
       Interval(0.60799999999999998, 0.81200000000000006, closed='right'),
       Interval(0.81200000000000006, 2.25, closed='right')], dtype=object)
Answered By: sacuL

In case you just want to use Integer for bins and not Categorical:

df.loc[df.vals > 0,'bin']= pd.qcut(df.loc[df.vals > 0,'vals'], q=4, labels=False)
df.loc[df.vals < 0,'bin']= pd.qcut(df.loc[df.vals < 0,'vals'], q=4, labels=False)-4
Answered By: luca
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.