sklearn.KBinsDiscretizer return 0 for all bins

Question:

I have an narray of shape (1,188). Trying to create bins using KBinsDiscretizer but it gives me back only zeros annotated bins.

 # transform the dataset with KBinsDiscretizer
    enc = KBinsDiscretizer(n_bins=18, encode='ordinal' ,strategy='uniform')#strategy='uniorm'

    test =(np.asarray(list_event_MES).astype(np.float)).reshape(1, -1)
    print(test)
    print(test.shape)

    enc.fit(test)
    test2 = enc.transform(test)
    print(test2.tolist())

Will return zero for all bins.

Matrice input :
[[0.13614053 0.14069501 0.08270327 0.26015096 0.15958708 0.16834299
0.14913976 0.11897561 0.23232807 0.0892398 0.1637264 0.17120459
0.19350733 0.18131615 0.20117186 0.1586006 0.19068352 0.24293008 . ….
0.2112216 0.21829195 0.28169516 0.27585681 0.27317305 0.1849694
0.23402622 0.24994829 0.20873297 0.25534803 0.15556027 0.27226802
0
0.14180543 0.24001428]]

shape :
(1, 188)

Warnings for the 188 columns :
/miniconda3/lib/python3.7/site-packages/sklearn/preprocessing/_discretization.py:159: UserWarning: Feature 0 is constant and will be replaced with 0.
“replaced with 0.” % jj)
/miniconda3/lib/python3.7/site-packages/sklearn/preprocessing/_discretization.py:159: UserWarning: Feature 1 is constant and will be replaced with 0.
“replaced with 0.” % jj)
/miniconda3/lib/python3.7/site-packages/sklearn/preprocessing/_discretization.py:159: UserWarning: Feature 2 is constant and will be replaced with 0.
“replaced with 0.” % jj)

Asked By: ZheFrench

||

Answers:

From the shape of your array (1,188), we can infer that there is only 1 sample and 188 features. As per the documentation ofKBinsDiscretizer, it is used for binning continuous data into intervals and it happens at a feature level, i.e. for each feature (or in other words for each column of your data) the KBinsDiscretizer computes the bin intervals and then bins your data, an example of which is shown below:

X = [[-2, 1, -4,   -1],
     [-1, 2, -3, -0.5],
     [ 0, 3, -2,  0.5],
     [ 1, 4, -1,    2]]
est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
est.fit(X)  

Xt = est.transform(X)
Xt  

array([[ 0., 0., 0., 0.],
      ​[ 1., 1., 1., 0.],
      ​[ 2., 2., 2., 1.],
      ​[ 2., 2., 2., 2.] 

Here for each column the discretizer computes the bin intervals and bins them. In your case you just have one data point for each feature so computing bins doesn’t make any sense. Instead, if your data is of shape (188,1) i.e. with 188 examples and 1 feature then it works perfect as shown below:

enc = KBinsDiscretizer(n_bins=18, encode='ordinal' ,strategy='uniform')
list_event_MES = np.random.normal(0,2,188).reshape(-1,1)
test =(np.asarray(list_event_MES))
print(test.shape)
(188,1)

enc.fit(test)
test2 = enc.transform(test)

print(test2[0:5])

array([[12.],
   [12.],
   [ 7.],
   [ 9.],
   [ 3.]])
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.