Type error with sklearn train_test_split [does not make sense]

Question:

I am not able to understand why train_test_split is throwing Type error.
Upon checking docs, it requires an array, which is what y is "numpy array".

from sklearn.model_selection import train_test_split
# create X and y
X = cvd_patient_data.drop("CVDriskindicator",axis=1)
y = tf.one_hot(cvd_patient_data["CVDriskindicator"],depth=5)

# Create train nd test data
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)
X_train.shape, y_train.shape

here’s op by checking datatype for "y":

<tf.Tensor: shape=(302, 5), dtype=float32, numpy=
array([[0., 0., 1., 0., 0.],
       [0., 1., 0., 0., 0.],
       [1., 0., 0., 0., 0.],
       ...,
       [0., 0., 0., 1., 0.],
       [0., 1., 0., 0., 0.],
       [1., 0., 0., 0., 0.]], dtype=float32)>

Error description from train_test_split:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-25-d0bc4bd8803a> in <module>
      1 # Create train nd test data
----> 2 X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)
      3 X_train.shape, y_train.shape

5 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/array_ops.py in _check_index(idx)
    905     # TODO(slebedev): IndexError seems more appropriate here, but it
    906     # will break `_slice_helper` contract.
--> 907     raise TypeError(_SLICE_TYPE_ERROR + ", got {!r}".format(idx))
    908 
    909 

TypeError: Only integers, slices (`:`), ellipsis (`...`), tf.newaxis (`None`) and scalar tf.int32/tf.int64 tensors are valid indices, got array([132, 202, 196,  75, 176,  59,  93,   6, 177,  30,  22, 258,  56,
       242, 114, 286, 281, 197, 158, 164, 244,  84,  66, 113, 167, 250,
        19, 143,  79, 144, 124,  72,  15,  10, 163, 155,  97,  68, 229,
        37,  16, 126, 290, 272,  67, 108,  69,  31, 178, 154, 230, 294,
        18, 185,  96, 183, 148,  86, 253, 288, 206, 287, 170, 234, 211,
        55, 186, 297, 210, 129,  38, 239, 173, 140, 112, 172, 117, 279,
       273, 165, 180, 182,   2, 115, 147, 181, 120, 215, 262, 127,  74,
        29,  83, 248, 107, 157, 208, 133, 194, 221,  65, 203,  85, 218,
       159,  12,  35,  28, 142, 195, 131, 226,  51,  95, 213, 225,  41,
        89, 222, 136,  26, 295, 141, 238,   0, 285, 274, 100, 261, 103,
       171,  98,  36,  61, 150, 264, 233, 247,  11, 298, 200, 269,  27,
       224,   4, 122,  32, 209, 162, 237, 259, 138,  62, 135, 128, 292,
         8,  70, 266,  64,  44, 240, 156,  40, 123, 277, 216, 153,  23,
       263, 110,  81, 207, 212,  39, 245, 293, 260, 199,  14,  47,  94,
       265, 227, 275, 201, 161,  43, 217, 145, 190, 220, 256,   3, 105,
        53,   1,  49,  80, 205,  34,  91,  52, 241,  13,  88, 166, 296,
       134, 289, 243,  54,  50, 174, 189, 300, 187, 169,  58,  48, 235,
       252,  21, 160, 276, 191, 257, 149, 130, 151,  99,  87, 214, 121,
       301,  20, 188,  71, 106, 270, 102])
Asked By: Jordan TheDodger

||

Answers:

y is not a numpy array but a tf tensor. Try:

y = tf.one_hot(cvd_patient_data["CVDriskindicator"],depth=5).numpy()
Answered By: LucG
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.