Trying to make a one hot encoding function using numpy
Question:
I’m trying to make a one hot encoding function using numpy:
def one_hot(indices):
mapping = dict([(value, key) for key, value in dict(enumerate([y for x in np.unique(np.vstack({tuple(row) for row in indices}), axis=0).tolist() for y in x])).items()])
for key in mapping.keys():
indices[indices == key] = mapping[key]
print(indices)
However, I get the following error:
machine-learning% python3 driver.py
Shape of train set is (216, 13)
Shape of test set is (54, 13)
Shape of train label is (216, 1)
Shape of test labels is (54, 1)
Traceback (most recent call last):
File "/home/user/Documents/IKP-HomomorphicEncryption/driver.py", line 1109, in <module>
main()
File "/home/user/Documents/IKP-HomomorphicEncryption/driver.py", line 1082, in main
one_hot(X)
File "/home/user/Documents/IKP-HomomorphicEncryption/driver.py", line 52, in one_hot
mapping_reversed = dict(enumerate([y for x in np.unique(np.vstack({tuple(row) for row in indices}), axis=0).tolist() for y in x]))
File "<__array_function__ internals>", line 200, in vstack
File "/home/user/.local/lib/python3.9/site-packages/numpy/core/shape_base.py", line 296, in vstack
return _nx.concatenate(arrs, 0, dtype=dtype, casting=casting)
File "<__array_function__ internals>", line 200, in concatenate
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 23 and the array at index 1 has size 4
I realize that this means the dimensions don’t match. But when I print the data it appears as though all of the rows are the same length.
Answers:
Disclaimer: this answer does not explain you error but try to implement a simple one hot encoding function.
You can use return_inverse=True
as parameter of np.unique
as starting point:
def get_dummies(data):
label, index = np.unique(data, return_inverse=True)
return (index[:, None] == np.arange(len(label))).astype(int)
arr = ['dog', 'cat', 'fish', 'fish', 'cat', 'dog']
out = get_dummies(arr)
Output:
>>> out
array([[0, 1, 0], # dog
[1, 0, 0], # cat
[0, 0, 1], # fish
[0, 0, 1], # fish
[1, 0, 0], # cat
[0, 1, 0]]) # dog
I’m trying to make a one hot encoding function using numpy:
def one_hot(indices):
mapping = dict([(value, key) for key, value in dict(enumerate([y for x in np.unique(np.vstack({tuple(row) for row in indices}), axis=0).tolist() for y in x])).items()])
for key in mapping.keys():
indices[indices == key] = mapping[key]
print(indices)
However, I get the following error:
machine-learning% python3 driver.py
Shape of train set is (216, 13)
Shape of test set is (54, 13)
Shape of train label is (216, 1)
Shape of test labels is (54, 1)
Traceback (most recent call last):
File "/home/user/Documents/IKP-HomomorphicEncryption/driver.py", line 1109, in <module>
main()
File "/home/user/Documents/IKP-HomomorphicEncryption/driver.py", line 1082, in main
one_hot(X)
File "/home/user/Documents/IKP-HomomorphicEncryption/driver.py", line 52, in one_hot
mapping_reversed = dict(enumerate([y for x in np.unique(np.vstack({tuple(row) for row in indices}), axis=0).tolist() for y in x]))
File "<__array_function__ internals>", line 200, in vstack
File "/home/user/.local/lib/python3.9/site-packages/numpy/core/shape_base.py", line 296, in vstack
return _nx.concatenate(arrs, 0, dtype=dtype, casting=casting)
File "<__array_function__ internals>", line 200, in concatenate
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 23 and the array at index 1 has size 4
I realize that this means the dimensions don’t match. But when I print the data it appears as though all of the rows are the same length.
Disclaimer: this answer does not explain you error but try to implement a simple one hot encoding function.
You can use return_inverse=True
as parameter of np.unique
as starting point:
def get_dummies(data):
label, index = np.unique(data, return_inverse=True)
return (index[:, None] == np.arange(len(label))).astype(int)
arr = ['dog', 'cat', 'fish', 'fish', 'cat', 'dog']
out = get_dummies(arr)
Output:
>>> out
array([[0, 1, 0], # dog
[1, 0, 0], # cat
[0, 0, 1], # fish
[0, 0, 1], # fish
[1, 0, 0], # cat
[0, 1, 0]]) # dog