How to cross check a list of indices with a table containing multiple indices?
Question:
Using sqlite3 and Pandas I want to store setups of different devices in an SQL table. I have the following tables:
device
:
id
name
1
device_1
2
device_2
3
device_3
4
device_4
5
device_5
setups
:
id
sub_id
device_id
name
1
1
1
setup_1
1
2
5
setup_1
1
3
4
setup_1
2
1
5
setup_2
2
2
4
setup_2
I need a method which takes a list device_ids = [4,5]
and returns the setup-id
containing exactly the devices listed in device_ids
(neither more or less, the length and permutation of device_ids
can vary). I don’t know how to formulate a WHERE
statement that cross-checks if every device_id
of a setup is contained in the device_ids
list. I’m thinking of something like this:
device_ids = [4,5]
query = f"SELECT id FROM setups "
f"JOIN devices ON devices.id = setups.device_ids "
f"WHERE setups.device_ids IN {device_ids}"
pd.read_sql_query(query, con)
In the case of the list device_ids = [4,5]
the desired output is the id of setup_2.
Code to create these tables:
import pandas as pd
import sqlite3
con = sqlite3.connect(':memory:')
setups = {'id':[1,1,1,2,2], 'sub_id':[1,2,3,1,2],
'name':['setup_1','setup_1','setup_1', 'setup_2','setup_2'],
'device_ids':[1,5,4,5,4]}
setups = pd.DataFrame(setups)
setups.to_sql('setups', con)
print(setups)
devices = {'id':[1,2,3,4,5], 'name':['Device_1', 'Device_2', 'Device_3', 'Device_4', 'Device_4']}
devices = pd.DataFrame(devices)
devices.to_sql('devices', con)
print(devices)
Answers:
See this example :
SELECT * FROM cities WHERE NAME IN ('Moscow','Tokyo','Nairobi')
This is the way to use IN
with explicit values. Storing the values in ()
parenthesis.
But in your case device_ids
is a list and so after substitution in formatted string instead of parenthesis the explicit values are stored in []
square bracktes. Which is not correct as per the sql syntax.
But there is an object in python which uses ()
parenthesis to store values, tuple
. so just convert the list of device_ids to tuple and pass it to the formatted string.
Here is the code:
import pandas as pd
import sqlite3
con = sqlite3.connect(':memory:')
setups = {'id':[1,1,1,2,2], 'sub_id':[1,2,3,1,2],
'name':['setup_1','setup_1','setup_1', 'setup_2','setup_2'],
'device_ids':[1,5,4,5,4]}
setups = pd.DataFrame(setups)
setups.to_sql('setups', con)
#print(setups)
devices = {'id':[1,2,3,4,5], 'name':['Device_1', 'Device_2', 'Device_3', 'Device_4', 'Device_4']}
devices = pd.DataFrame(devices)
devices.to_sql('devices', con)
#print(devices)
device_ids = [4,5]
device_id_tup = tuple(device_ids)
query = f"SELECT DISTINCT id FROM setups a "
f"WHERE device_ids IN {device_id_tup} "
f"and not exists (SELECT * FROM setups b "
f"WHERE device_ids NOT IN {device_id_tup} and a.id = b.id)"
res = pd.read_sql_query(query, con)
print(list(res['id']))
And the output:
[2]
If you need an explanation of above command, let me know.
I found an answer to my question that seems to work.
device_id_tup = (4,5)
query = f"SELECT DISTINCT id FROM setups a "
f"WHERE device_ids IN {device_id_tup} "
f"Group by id "
f"having count(*) = {len(device_id_tup)} "
f"and not exists (SELECT * FROM setups b "
f"WHERE device_ids NOT IN {device_id_tup} and a.id = b.id)"
res = pd.read_sql_query(query, con)
print(list(res['id']))
The only small regret is that it doesn’t work for device_id_tup
tuples with a length of 1.
Using sqlite3 and Pandas I want to store setups of different devices in an SQL table. I have the following tables:
device
:
id | name |
---|---|
1 | device_1 |
2 | device_2 |
3 | device_3 |
4 | device_4 |
5 | device_5 |
setups
:
id | sub_id | device_id | name |
---|---|---|---|
1 | 1 | 1 | setup_1 |
1 | 2 | 5 | setup_1 |
1 | 3 | 4 | setup_1 |
2 | 1 | 5 | setup_2 |
2 | 2 | 4 | setup_2 |
I need a method which takes a list device_ids = [4,5]
and returns the setup-id
containing exactly the devices listed in device_ids
(neither more or less, the length and permutation of device_ids
can vary). I don’t know how to formulate a WHERE
statement that cross-checks if every device_id
of a setup is contained in the device_ids
list. I’m thinking of something like this:
device_ids = [4,5]
query = f"SELECT id FROM setups "
f"JOIN devices ON devices.id = setups.device_ids "
f"WHERE setups.device_ids IN {device_ids}"
pd.read_sql_query(query, con)
In the case of the list device_ids = [4,5]
the desired output is the id of setup_2.
Code to create these tables:
import pandas as pd
import sqlite3
con = sqlite3.connect(':memory:')
setups = {'id':[1,1,1,2,2], 'sub_id':[1,2,3,1,2],
'name':['setup_1','setup_1','setup_1', 'setup_2','setup_2'],
'device_ids':[1,5,4,5,4]}
setups = pd.DataFrame(setups)
setups.to_sql('setups', con)
print(setups)
devices = {'id':[1,2,3,4,5], 'name':['Device_1', 'Device_2', 'Device_3', 'Device_4', 'Device_4']}
devices = pd.DataFrame(devices)
devices.to_sql('devices', con)
print(devices)
See this example :
SELECT * FROM cities WHERE NAME IN ('Moscow','Tokyo','Nairobi')
This is the way to use IN
with explicit values. Storing the values in ()
parenthesis.
But in your case device_ids
is a list and so after substitution in formatted string instead of parenthesis the explicit values are stored in []
square bracktes. Which is not correct as per the sql syntax.
But there is an object in python which uses ()
parenthesis to store values, tuple
. so just convert the list of device_ids to tuple and pass it to the formatted string.
Here is the code:
import pandas as pd
import sqlite3
con = sqlite3.connect(':memory:')
setups = {'id':[1,1,1,2,2], 'sub_id':[1,2,3,1,2],
'name':['setup_1','setup_1','setup_1', 'setup_2','setup_2'],
'device_ids':[1,5,4,5,4]}
setups = pd.DataFrame(setups)
setups.to_sql('setups', con)
#print(setups)
devices = {'id':[1,2,3,4,5], 'name':['Device_1', 'Device_2', 'Device_3', 'Device_4', 'Device_4']}
devices = pd.DataFrame(devices)
devices.to_sql('devices', con)
#print(devices)
device_ids = [4,5]
device_id_tup = tuple(device_ids)
query = f"SELECT DISTINCT id FROM setups a "
f"WHERE device_ids IN {device_id_tup} "
f"and not exists (SELECT * FROM setups b "
f"WHERE device_ids NOT IN {device_id_tup} and a.id = b.id)"
res = pd.read_sql_query(query, con)
print(list(res['id']))
And the output:
[2]
If you need an explanation of above command, let me know.
I found an answer to my question that seems to work.
device_id_tup = (4,5)
query = f"SELECT DISTINCT id FROM setups a "
f"WHERE device_ids IN {device_id_tup} "
f"Group by id "
f"having count(*) = {len(device_id_tup)} "
f"and not exists (SELECT * FROM setups b "
f"WHERE device_ids NOT IN {device_id_tup} and a.id = b.id)"
res = pd.read_sql_query(query, con)
print(list(res['id']))
The only small regret is that it doesn’t work for device_id_tup
tuples with a length of 1.