Trying to make a one hot encoding function using numpy

Question:

I’m trying to make a one hot encoding function using numpy:

def one_hot(indices):
    mapping = dict([(value, key) for key, value in dict(enumerate([y for x in np.unique(np.vstack({tuple(row) for row in indices}), axis=0).tolist() for y in x])).items()])
    for key in mapping.keys():
        indices[indices == key] = mapping[key]
    print(indices)

However, I get the following error:

machine-learning% python3 driver.py
Shape of train set is (216, 13)
Shape of test set is (54, 13)
Shape of train label is (216, 1)
Shape of test labels is (54, 1)
Traceback (most recent call last):
  File "/home/user/Documents/IKP-HomomorphicEncryption/driver.py", line 1109, in <module>
    main()
  File "/home/user/Documents/IKP-HomomorphicEncryption/driver.py", line 1082, in main
    one_hot(X)
  File "/home/user/Documents/IKP-HomomorphicEncryption/driver.py", line 52, in one_hot
    mapping_reversed = dict(enumerate([y for x in np.unique(np.vstack({tuple(row) for row in indices}), axis=0).tolist() for y in x]))
  File "<__array_function__ internals>", line 200, in vstack
  File "/home/user/.local/lib/python3.9/site-packages/numpy/core/shape_base.py", line 296, in vstack
    return _nx.concatenate(arrs, 0, dtype=dtype, casting=casting)
  File "<__array_function__ internals>", line 200, in concatenate
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 23 and the array at index 1 has size 4

I realize that this means the dimensions don’t match. But when I print the data it appears as though all of the rows are the same length.

Asked By: clickerticker48

||

Answers:

Disclaimer: this answer does not explain you error but try to implement a simple one hot encoding function.

You can use return_inverse=True as parameter of np.unique as starting point:

def get_dummies(data):
    label, index = np.unique(data, return_inverse=True)
    return (index[:, None] == np.arange(len(label))).astype(int)

arr = ['dog', 'cat', 'fish', 'fish', 'cat', 'dog']
out = get_dummies(arr)

Output:

>>> out
array([[0, 1, 0],   # dog
       [1, 0, 0],   # cat
       [0, 0, 1],   # fish
       [0, 0, 1],   # fish
       [1, 0, 0],   # cat
       [0, 1, 0]])  # dog
Answered By: Corralien
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.