list literal and list comprehension behaving differently
Question:
I’m having a very strange issue with Python 3.7. Specifically, I have a function that takes a list of document IDs, and returns the wikipedia documents they correspond to. What’s strange is that, if I pass in a list comprehension as I want to do, it returns nothing, but if I pass in a list literal with the exact same values, it somehow works. Note that this is using pdb, in the interactive prompt it opens when you type interact
:
If I run the list comprehension, I get this list:
>>> [x[0] for x in truncated]
[3553957, 4480571, 4686346, 1955046, 4476254, 4510002, 3941950, 2991560, 5314256, 3949007]
If I run the query with this list literal, it works (data truncated for brevity):
>>> self._db.query_ids([3553957, 4480571, 4686346, 1955046, 4476254, 4510002, 3941950, 2991560, 5314256, 3949007])
[(1955046, 'Hairy_nightshade', 'Hairy nightshade is a common name for...')]
But if I combine the two expressions, it returns nothing:
>>> self._db.query_ids([x[0] for x in truncated])
[]
The actual function that’s being called has no side effects, it just queries a database, so it doesn’t change between calls in any way:
def query_ids(self, ids):
"""
Returns the tokens for each document with the given ID
"""
result = self.conn.execute(
'SELECT doc_id, document, group_concat(tokens, " ") FROM doc WHERE doc_id in ({}) GROUP BY doc_id'.format(
', '.join(['?'] * len(ids))), ids)
data = result.fetchall()
return data
How is this possible?
If I add a print(ids)
to the first line of my query_ids
function, the list of IDs is printed identically both times, but it still doesn’t work with the list comprehension:
(Pdb) self._db.query_ids([x[0] for x in truncated])
[3553957, 4480571, 4686346, 1955046, 4476254, 4510002, 3941950, 2991560, 5314256, 3949007]
[]
(Pdb) self._db.query_ids([3553957, 4480571, 4686346, 1955046, 4476254, 4510002, 3941950, 2991560, 5314256, 3949007])
[3553957, 4480571, 4686346, 1955046, 4476254, 4510002, 3941950, 2991560, 5314256, 3949007]
[(1955046, 'Hairy_nightshade', 'Hairy nightshade is a common name for several plants and may refer to...')]
Answers:
This was a strange bug, but I believe I’ve worked it out.
The issue wasn’t the type of truncated
, which was a list, but rather the contents of that list were numpy int64 types, not python integers:
(Pdb) !a = [x[0] for x in truncated]
(Pdb) type(a)
<class 'list'>
(Pdb) type(a[0])
<class 'numpy.int64'>
When this list of numpy.int64
s were passed into the database query, they were ignored, because the Python sqlite3 API doesn’t know how to deal with non-native Python types: https://docs.python.org/3/library/sqlite3.html#using-adapters-to-store-additional-python-types-in-sqlite-databases
The following Python types can thus be sent to SQLite without any problem: None, int, float, str, bytes
Thus, when I converted the data into native Python integers, it worked:
(Pdb) self._db.query_ids([int(x[0]) for x in truncated])
[3553957, 4480571, 4686346, 1955046, 4476254, 4510002, 3941950, 2991560, 5314256, 3949007]
[(1955046, 'Hairy_nightshade', 'Hairy nightshade is a common name for several plants and may refer to ')]
I’m having a very strange issue with Python 3.7. Specifically, I have a function that takes a list of document IDs, and returns the wikipedia documents they correspond to. What’s strange is that, if I pass in a list comprehension as I want to do, it returns nothing, but if I pass in a list literal with the exact same values, it somehow works. Note that this is using pdb, in the interactive prompt it opens when you type interact
:
If I run the list comprehension, I get this list:
>>> [x[0] for x in truncated]
[3553957, 4480571, 4686346, 1955046, 4476254, 4510002, 3941950, 2991560, 5314256, 3949007]
If I run the query with this list literal, it works (data truncated for brevity):
>>> self._db.query_ids([3553957, 4480571, 4686346, 1955046, 4476254, 4510002, 3941950, 2991560, 5314256, 3949007])
[(1955046, 'Hairy_nightshade', 'Hairy nightshade is a common name for...')]
But if I combine the two expressions, it returns nothing:
>>> self._db.query_ids([x[0] for x in truncated])
[]
The actual function that’s being called has no side effects, it just queries a database, so it doesn’t change between calls in any way:
def query_ids(self, ids):
"""
Returns the tokens for each document with the given ID
"""
result = self.conn.execute(
'SELECT doc_id, document, group_concat(tokens, " ") FROM doc WHERE doc_id in ({}) GROUP BY doc_id'.format(
', '.join(['?'] * len(ids))), ids)
data = result.fetchall()
return data
How is this possible?
If I add a print(ids)
to the first line of my query_ids
function, the list of IDs is printed identically both times, but it still doesn’t work with the list comprehension:
(Pdb) self._db.query_ids([x[0] for x in truncated])
[3553957, 4480571, 4686346, 1955046, 4476254, 4510002, 3941950, 2991560, 5314256, 3949007]
[]
(Pdb) self._db.query_ids([3553957, 4480571, 4686346, 1955046, 4476254, 4510002, 3941950, 2991560, 5314256, 3949007])
[3553957, 4480571, 4686346, 1955046, 4476254, 4510002, 3941950, 2991560, 5314256, 3949007]
[(1955046, 'Hairy_nightshade', 'Hairy nightshade is a common name for several plants and may refer to...')]
This was a strange bug, but I believe I’ve worked it out.
The issue wasn’t the type of truncated
, which was a list, but rather the contents of that list were numpy int64 types, not python integers:
(Pdb) !a = [x[0] for x in truncated]
(Pdb) type(a)
<class 'list'>
(Pdb) type(a[0])
<class 'numpy.int64'>
When this list of numpy.int64
s were passed into the database query, they were ignored, because the Python sqlite3 API doesn’t know how to deal with non-native Python types: https://docs.python.org/3/library/sqlite3.html#using-adapters-to-store-additional-python-types-in-sqlite-databases
The following Python types can thus be sent to SQLite without any problem: None, int, float, str, bytes
Thus, when I converted the data into native Python integers, it worked:
(Pdb) self._db.query_ids([int(x[0]) for x in truncated])
[3553957, 4480571, 4686346, 1955046, 4476254, 4510002, 3941950, 2991560, 5314256, 3949007]
[(1955046, 'Hairy_nightshade', 'Hairy nightshade is a common name for several plants and may refer to ')]