sklearn.KBinsDiscretizer return 0 for all bins
Question:
I have an narray of shape (1,188). Trying to create bins using KBinsDiscretizer but it gives me back only zeros annotated bins.
# transform the dataset with KBinsDiscretizer
enc = KBinsDiscretizer(n_bins=18, encode='ordinal' ,strategy='uniform')#strategy='uniorm'
test =(np.asarray(list_event_MES).astype(np.float)).reshape(1, -1)
print(test)
print(test.shape)
enc.fit(test)
test2 = enc.transform(test)
print(test2.tolist())
Will return zero for all bins.
Matrice input :
[[0.13614053 0.14069501 0.08270327 0.26015096 0.15958708 0.16834299
0.14913976 0.11897561 0.23232807 0.0892398 0.1637264 0.17120459
0.19350733 0.18131615 0.20117186 0.1586006 0.19068352 0.24293008 . ….
0.2112216 0.21829195 0.28169516 0.27585681 0.27317305 0.1849694
0.23402622 0.24994829 0.20873297 0.25534803 0.15556027 0.27226802
0
0.14180543 0.24001428]]
shape :
(1, 188)
Warnings for the 188 columns :
/miniconda3/lib/python3.7/site-packages/sklearn/preprocessing/_discretization.py:159: UserWarning: Feature 0 is constant and will be replaced with 0.
“replaced with 0.” % jj)
/miniconda3/lib/python3.7/site-packages/sklearn/preprocessing/_discretization.py:159: UserWarning: Feature 1 is constant and will be replaced with 0.
“replaced with 0.” % jj)
/miniconda3/lib/python3.7/site-packages/sklearn/preprocessing/_discretization.py:159: UserWarning: Feature 2 is constant and will be replaced with 0.
“replaced with 0.” % jj)
Answers:
From the shape of your array (1,188)
, we can infer that there is only 1
sample and 188
features. As per the documentation ofKBinsDiscretizer
, it is used for binning continuous data into intervals and it happens at a feature level, i.e. for each feature (or in other words for each column of your data) the KBinsDiscretizer
computes the bin intervals and then bins your data, an example of which is shown below:
X = [[-2, 1, -4, -1],
[-1, 2, -3, -0.5],
[ 0, 3, -2, 0.5],
[ 1, 4, -1, 2]]
est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
est.fit(X)
Xt = est.transform(X)
Xt
array([[ 0., 0., 0., 0.],
[ 1., 1., 1., 0.],
[ 2., 2., 2., 1.],
[ 2., 2., 2., 2.]
Here for each column the discretizer computes the bin intervals and bins them. In your case you just have one data point for each feature so computing bins doesn’t make any sense. Instead, if your data is of shape (188,1)
i.e. with 188
examples and 1
feature then it works perfect as shown below:
enc = KBinsDiscretizer(n_bins=18, encode='ordinal' ,strategy='uniform')
list_event_MES = np.random.normal(0,2,188).reshape(-1,1)
test =(np.asarray(list_event_MES))
print(test.shape)
(188,1)
enc.fit(test)
test2 = enc.transform(test)
print(test2[0:5])
array([[12.],
[12.],
[ 7.],
[ 9.],
[ 3.]])
I have an narray of shape (1,188). Trying to create bins using KBinsDiscretizer but it gives me back only zeros annotated bins.
# transform the dataset with KBinsDiscretizer
enc = KBinsDiscretizer(n_bins=18, encode='ordinal' ,strategy='uniform')#strategy='uniorm'
test =(np.asarray(list_event_MES).astype(np.float)).reshape(1, -1)
print(test)
print(test.shape)
enc.fit(test)
test2 = enc.transform(test)
print(test2.tolist())
Will return zero for all bins.
Matrice input :
[[0.13614053 0.14069501 0.08270327 0.26015096 0.15958708 0.16834299
0.14913976 0.11897561 0.23232807 0.0892398 0.1637264 0.17120459
0.19350733 0.18131615 0.20117186 0.1586006 0.19068352 0.24293008 . ….
0.2112216 0.21829195 0.28169516 0.27585681 0.27317305 0.1849694
0.23402622 0.24994829 0.20873297 0.25534803 0.15556027 0.27226802
0
0.14180543 0.24001428]]shape :
(1, 188)Warnings for the 188 columns :
/miniconda3/lib/python3.7/site-packages/sklearn/preprocessing/_discretization.py:159: UserWarning: Feature 0 is constant and will be replaced with 0.
“replaced with 0.” % jj)
/miniconda3/lib/python3.7/site-packages/sklearn/preprocessing/_discretization.py:159: UserWarning: Feature 1 is constant and will be replaced with 0.
“replaced with 0.” % jj)
/miniconda3/lib/python3.7/site-packages/sklearn/preprocessing/_discretization.py:159: UserWarning: Feature 2 is constant and will be replaced with 0.
“replaced with 0.” % jj)
From the shape of your array (1,188)
, we can infer that there is only 1
sample and 188
features. As per the documentation ofKBinsDiscretizer
, it is used for binning continuous data into intervals and it happens at a feature level, i.e. for each feature (or in other words for each column of your data) the KBinsDiscretizer
computes the bin intervals and then bins your data, an example of which is shown below:
X = [[-2, 1, -4, -1],
[-1, 2, -3, -0.5],
[ 0, 3, -2, 0.5],
[ 1, 4, -1, 2]]
est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
est.fit(X)
Xt = est.transform(X)
Xt
array([[ 0., 0., 0., 0.],
[ 1., 1., 1., 0.],
[ 2., 2., 2., 1.],
[ 2., 2., 2., 2.]
Here for each column the discretizer computes the bin intervals and bins them. In your case you just have one data point for each feature so computing bins doesn’t make any sense. Instead, if your data is of shape (188,1)
i.e. with 188
examples and 1
feature then it works perfect as shown below:
enc = KBinsDiscretizer(n_bins=18, encode='ordinal' ,strategy='uniform')
list_event_MES = np.random.normal(0,2,188).reshape(-1,1)
test =(np.asarray(list_event_MES))
print(test.shape)
(188,1)
enc.fit(test)
test2 = enc.transform(test)
print(test2[0:5])
array([[12.],
[12.],
[ 7.],
[ 9.],
[ 3.]])