Cannot cast array data from dtype('O') in np.bincount

Question:

Unfortunately I cannot share the data I am now using, so this question will not contain an MWE.

I have this code:

def baseline(labels):
    # dummy classifier returning the most common label in labels
    print(labels.shape)
    print(type(labels))
    print(type(labels[0]))
    print(type(labels[2]))
    print(labels)
    counts = np.bincount(labels)
    value = np.argmax(counts)

This code runs fine with most input files containing the labels. However, on a subset of files, I get the error:

Cannot cast array data from dtype(‘O’) to dtype(‘int64’) according to the rule ‘safe’

that I cannot understand. Output is:

(891,)
<class 'numpy.ndarray'>
<class 'int'>
<class 'int'>
[0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 1 1 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0
 0 1 1 0 1 0 0 0 1 0 1 0 1 1 1 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 1 1 0 0 0 1
 0 0 0 0 1 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 1 1
 1 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1 0 1 0 0 0 0 1 1 1 1 1 1 0 0 1 1 1 0 1 1
 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0
 0 0 0 1 1 1 0 1 1 0 0 1 0 1 0 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 0 1 0 0 0 0 0
 0 1 1 1 0 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 1 1 0 1 1
 1 0 0 1 0 1 0 0 0 1 0 0 1 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 1 0 0 1 0
 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 1 1 0 0 1 1 0 1 0 0 0 0 1 0 0
 1 0 1 0 1 1 1 1 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 1 1 0 1 0 1 0 1 1 0 1
 0 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1 1 0 1 1 1 1 0 0 0 0
 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 0 1 1
 0 0 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 1 0 1 0 0 1 1 1 0 1 1 0 1 0 0 0 0 1
 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 1 1 0 1 1 0
 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 1 0 1 0 0 0 1 1 0
 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 0 0
 0 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 0 0 0 0 1 1 1 0
 1 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 1 0 1 0 0 0
 1 1 0 0 0 1 0 0 0 1 0 1 0 1 1 0 1 0 1 1 1 1 0 1 1 1 0 0 1 0 1 0 1 0 1 0 0
 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 0 0 0
 0 0 1 1 0 1 0 1 0 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 0 1 0 0 1 0 0 1
 0 0 1 0 1 1 0 1 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 0 0 0 1 1 0 1 0 1 1 1 0
 0 0 0 1 1 1 0 1 0 1 0 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 0 1 1 0 0 1 1 1 1 0 1
 1 0 0 1 0 1 0 1 0 1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 0 1 1 1 1 1 0 0 0 0 1 0
 0 0 1]
Traceback (most recent call last):
  File "07_training_test.py", line 577, in <module>
    fire.Fire(main)
  File "/home/user/miniconda3/envs/proj/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/user/miniconda3/envs/proj/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/user/miniconda3/envs/proj/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "07_training_test.py", line 554, in main
    res = process_file(fn, parameters, config)
  File "07_training_test.py", line 434, in process_file
    value_train, train_acc = utils.baseline(full_labels.loc[train_i].to_numpy())
  File "/home/user/workspace/proj/src/pipeline_paper/utils.py", line 186, in baseline
    counts = np.bincount(labels)
  File "<__array_function__ internals>", line 5, in bincount
TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'

There are other questions on this error, but in different contexts so I was not able to solve the issue following the answers.

Asked By: Antonio Sesto

||

Answers:

This is how you can reproduce your error:

>>> arr = np.random.binomial(n=1, p=0.5, size=891)
>>> arr.dtype
dtype('int64')

>>> baseline(arr)  # Works fine
>>> arr = arr.astype(object)
>>> arr.dtype
dtype('O')
>>> baseline(arr)
TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'

So even if your array contains integers but has a type "object", np.bincount throws the error. Try to cast labels to numpy integer array manually before running the code. E.g.:

arr = arr.astype(np.int8)
Answered By: Vladimir Shitov
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.