Correct way to test for numpy.dtype
Question:
I’m looking at a third-party lib that has the following if
-test:
if isinstance(xx_, numpy.ndarray) and xx_.dtype is numpy.float64 and xx_.flags.contiguous:
xx_[:] = ctypes.cast(xx_.ctypes._as_parameter_,ctypes.POINTER(ctypes.c_double))
It appears that xx_.dtype is numpy.float64
always fails:
>>> xx_ = numpy.zeros(8, dtype=numpy.float64)
>>> xx_.dtype is numpy.float64
False
What is the correct way to test that the dtype
of a numpy array is float64
?
Answers:
This is a bug in the lib.
dtype
objects can be constructed dynamically. And NumPy does so all the time. There’s no guarantee anywhere that they’re interned, so constructing a dtype
that already exists will give you the same one.
On top of that, np.float64
isn’t actually a dtype
; it’s a… I don’t know what these types are called, but the types used to construct scalar objects out of array bytes, which are usually found in the type
attribute of a dtype
, so I’m going to call it a dtype.type
. (Note that np.float64
subclasses both NumPy’s numeric tower types and Python’s numeric tower ABCs, while np.dtype
of course doesn’t.)
Normally, you can use these interchangeably; when you use a dtype.type
—or, for that matter, a native Python numeric type—where a dtype
was expected, a dtype
is constructed on the fly (which, again, is not guaranteed to be interned), but of course that doesn’t mean they’re identical:
>>> np.float64 == np.dtype(np.float64) == np.dtype('float64')
True
>>> np.float64 == np.dtype(np.float64).type
True
The dtype.type
usually will be identical if you’re using builtin types:
>>> np.float64 is np.dtype(np.float64).type
True
But two dtype
s are often not:
>>> np.dtype(np.float64) is np.dtype('float64')
False
But again, none of that is guaranteed. (Also, note that np.float64
and float
use the exact same storage, but are separate types. And of course you can also make a dtype('f8')
, which is guaranteed to work the same as dtype(np.float64)
, but that doesn’t mean 'f8'
is
, or even ==
, np.float64
.)
So, it’s possible that constructing an array by explicitly passing np.float64
as its dtype
argument will mean you get back the same instance when you check the dtype.type
attribute, but that isn’t guaranteed. And if you pass np.dtype('float64')
, or you ask NumPy to infer it from the data, or you pass a dtype string for it to parse like 'f8'
, etc., it’s even less likely to match. More importantly, you definitely not get np.float64
back as the dtype
itself.
So, how should it be fixed?
Well, the docs define what it means for two dtype
s to be equal, and that’s a useful thing, and I think it’s probably the useful thing you’re looking for here. So, just replace the is
with ==
:
if isinstance(xx_, numpy.ndarray) and xx_.dtype == numpy.float64 and xx_.flags.contiguous:
However, to some extent I’m only guessing that’s what you’re looking for. (The fact that it’s checking the contiguous flag implies that it’s probably going to go right into the internal storage… but then why isn’t it checking C vs. Fortran order, or byte order, or anything else?)
Try:
x = np.zeros(8, dtype=np.float64)
print x.dtype is np.dtype(np.float64))
is
tests for the identity of 2 objects, whether they have the same id()
. It is used for example to test is None
, but can give errors when testing for integers or strings. But in this case, there’s a further problem, x.dtype
and np.float64
are not the same class.
isinstance(x.dtype, np.dtype) # True
isinstance(np.float64, np.dtype) # False
x.dtype.__class__ # numpy.dtype
np.float64.__class__ # type
np.float64
is actually a function. np.float64()
produces 0.0
. x.dtype()
produces an error. (correction np.float64
is a class.)
In my interactive tests:
x.dtype is np.dtype(np.float64)
returns True
. But I don’t know if that’s universally the case, or just the result of some sort of local caching. The dtype
documentation mentions a dtype
attribute:
dtype.num A unique number for each of the 21 different built-in types.
Both dtypes give 12
for this num
.
x.dtype == np.float64
tests True
.
Also, using type
works:
x.dtype.type is np.float64 # True
When I import ctypes
and do the cast
(with your xx_
) I get an error:
ValueError: setting an array element with a sequence.
I don’t know enough of ctypes
to understand what it is trying to do. It looks like it is doing a type conversion of the data
pointer of xx_
, xx_.ctypes._as_parameter_
is the same number as xx_.__array_interface__['data'][0]
.
In the numpy
test code I find these dtype tests:
issubclass(arr.dtype.type, (nt.integer, nt.bool_)
assert_(dat.dtype.type is np.float64)
assert_equal(A.dtype.type, np.unicode_)
assert_equal(r['col1'].dtype.kind, 'i')
numpy
documentation also talks about
np.issubdtype(x.dtype, np.float64)
np.issubsctype(x, np.float64)
both of which use issubclass
.
Further tracing of the c
code suggests that x.dtype == np.float64
is evaluated as:
x.dtype.num == np.dtype(np.float64).num
That is, the scalar type is converted to a dtype
, and the .num
attributes compared. The code is in scalarapi.c
, descriptor.c
, multiarraymodule.c
of numpy / core / src / multiarray
I’m not sure when this API was introduced, but at least as of 2022 it looks like you can use numpy.issubdtype for the type checking part and therefore write:
if isinstance(arr, numpy.ndarray) and numpy.issubdtype(arr.dtype, numpy.floating):
...
I’m looking at a third-party lib that has the following if
-test:
if isinstance(xx_, numpy.ndarray) and xx_.dtype is numpy.float64 and xx_.flags.contiguous:
xx_[:] = ctypes.cast(xx_.ctypes._as_parameter_,ctypes.POINTER(ctypes.c_double))
It appears that xx_.dtype is numpy.float64
always fails:
>>> xx_ = numpy.zeros(8, dtype=numpy.float64)
>>> xx_.dtype is numpy.float64
False
What is the correct way to test that the dtype
of a numpy array is float64
?
This is a bug in the lib.
dtype
objects can be constructed dynamically. And NumPy does so all the time. There’s no guarantee anywhere that they’re interned, so constructing a dtype
that already exists will give you the same one.
On top of that, np.float64
isn’t actually a dtype
; it’s a… I don’t know what these types are called, but the types used to construct scalar objects out of array bytes, which are usually found in the type
attribute of a dtype
, so I’m going to call it a dtype.type
. (Note that np.float64
subclasses both NumPy’s numeric tower types and Python’s numeric tower ABCs, while np.dtype
of course doesn’t.)
Normally, you can use these interchangeably; when you use a dtype.type
—or, for that matter, a native Python numeric type—where a dtype
was expected, a dtype
is constructed on the fly (which, again, is not guaranteed to be interned), but of course that doesn’t mean they’re identical:
>>> np.float64 == np.dtype(np.float64) == np.dtype('float64')
True
>>> np.float64 == np.dtype(np.float64).type
True
The dtype.type
usually will be identical if you’re using builtin types:
>>> np.float64 is np.dtype(np.float64).type
True
But two dtype
s are often not:
>>> np.dtype(np.float64) is np.dtype('float64')
False
But again, none of that is guaranteed. (Also, note that np.float64
and float
use the exact same storage, but are separate types. And of course you can also make a dtype('f8')
, which is guaranteed to work the same as dtype(np.float64)
, but that doesn’t mean 'f8'
is
, or even ==
, np.float64
.)
So, it’s possible that constructing an array by explicitly passing np.float64
as its dtype
argument will mean you get back the same instance when you check the dtype.type
attribute, but that isn’t guaranteed. And if you pass np.dtype('float64')
, or you ask NumPy to infer it from the data, or you pass a dtype string for it to parse like 'f8'
, etc., it’s even less likely to match. More importantly, you definitely not get np.float64
back as the dtype
itself.
So, how should it be fixed?
Well, the docs define what it means for two dtype
s to be equal, and that’s a useful thing, and I think it’s probably the useful thing you’re looking for here. So, just replace the is
with ==
:
if isinstance(xx_, numpy.ndarray) and xx_.dtype == numpy.float64 and xx_.flags.contiguous:
However, to some extent I’m only guessing that’s what you’re looking for. (The fact that it’s checking the contiguous flag implies that it’s probably going to go right into the internal storage… but then why isn’t it checking C vs. Fortran order, or byte order, or anything else?)
Try:
x = np.zeros(8, dtype=np.float64)
print x.dtype is np.dtype(np.float64))
is
tests for the identity of 2 objects, whether they have the same id()
. It is used for example to test is None
, but can give errors when testing for integers or strings. But in this case, there’s a further problem, x.dtype
and np.float64
are not the same class.
isinstance(x.dtype, np.dtype) # True
isinstance(np.float64, np.dtype) # False
x.dtype.__class__ # numpy.dtype
np.float64.__class__ # type
np.float64
is actually a function. np.float64()
produces 0.0
. x.dtype()
produces an error. (correction np.float64
is a class.)
In my interactive tests:
x.dtype is np.dtype(np.float64)
returns True
. But I don’t know if that’s universally the case, or just the result of some sort of local caching. The dtype
documentation mentions a dtype
attribute:
dtype.num A unique number for each of the 21 different built-in types.
Both dtypes give 12
for this num
.
x.dtype == np.float64
tests True
.
Also, using type
works:
x.dtype.type is np.float64 # True
When I import ctypes
and do the cast
(with your xx_
) I get an error:
ValueError: setting an array element with a sequence.
I don’t know enough of ctypes
to understand what it is trying to do. It looks like it is doing a type conversion of the data
pointer of xx_
, xx_.ctypes._as_parameter_
is the same number as xx_.__array_interface__['data'][0]
.
In the numpy
test code I find these dtype tests:
issubclass(arr.dtype.type, (nt.integer, nt.bool_)
assert_(dat.dtype.type is np.float64)
assert_equal(A.dtype.type, np.unicode_)
assert_equal(r['col1'].dtype.kind, 'i')
numpy
documentation also talks about
np.issubdtype(x.dtype, np.float64)
np.issubsctype(x, np.float64)
both of which use issubclass
.
Further tracing of the c
code suggests that x.dtype == np.float64
is evaluated as:
x.dtype.num == np.dtype(np.float64).num
That is, the scalar type is converted to a dtype
, and the .num
attributes compared. The code is in scalarapi.c
, descriptor.c
, multiarraymodule.c
of numpy / core / src / multiarray
I’m not sure when this API was introduced, but at least as of 2022 it looks like you can use numpy.issubdtype for the type checking part and therefore write:
if isinstance(arr, numpy.ndarray) and numpy.issubdtype(arr.dtype, numpy.floating):
...