How to cross check a list of indices with a table containing multiple indices?

Question:

Using sqlite3 and Pandas I want to store setups of different devices in an SQL table. I have the following tables:

device :

id name
1 device_1
2 device_2
3 device_3
4 device_4
5 device_5

setups :

id sub_id device_id name
1 1 1 setup_1
1 2 5 setup_1
1 3 4 setup_1
2 1 5 setup_2
2 2 4 setup_2

I need a method which takes a list device_ids = [4,5] and returns the setup-id containing exactly the devices listed in device_ids (neither more or less, the length and permutation of device_ids can vary). I don’t know how to formulate a WHERE statement that cross-checks if every device_id of a setup is contained in the device_ids list. I’m thinking of something like this:

device_ids = [4,5]
query = f"SELECT id FROM setups " 
        f"JOIN devices ON devices.id = setups.device_ids " 
        f"WHERE setups.device_ids IN {device_ids}"
pd.read_sql_query(query, con)

In the case of the list device_ids = [4,5] the desired output is the id of setup_2.

Code to create these tables:

import pandas as pd
import sqlite3

con = sqlite3.connect(':memory:')

setups = {'id':[1,1,1,2,2], 'sub_id':[1,2,3,1,2],
         'name':['setup_1','setup_1','setup_1', 'setup_2','setup_2'],
         'device_ids':[1,5,4,5,4]}
setups = pd.DataFrame(setups)
setups.to_sql('setups', con)
print(setups)


devices = {'id':[1,2,3,4,5], 'name':['Device_1', 'Device_2', 'Device_3', 'Device_4', 'Device_4']}
devices = pd.DataFrame(devices)
devices.to_sql('devices', con)
print(devices)
Asked By: Yehla

||

Answers:

See this example :

SELECT * FROM cities WHERE NAME IN ('Moscow','Tokyo','Nairobi')

This is the way to use IN with explicit values. Storing the values in () parenthesis.

But in your case device_ids is a list and so after substitution in formatted string instead of parenthesis the explicit values are stored in [] square bracktes. Which is not correct as per the sql syntax.

But there is an object in python which uses () parenthesis to store values, tuple. so just convert the list of device_ids to tuple and pass it to the formatted string.

Here is the code:

import pandas as pd
import sqlite3

con = sqlite3.connect(':memory:')

setups = {'id':[1,1,1,2,2], 'sub_id':[1,2,3,1,2],
         'name':['setup_1','setup_1','setup_1', 'setup_2','setup_2'],
         'device_ids':[1,5,4,5,4]}
setups = pd.DataFrame(setups)
setups.to_sql('setups', con)
#print(setups)


devices = {'id':[1,2,3,4,5], 'name':['Device_1', 'Device_2', 'Device_3', 'Device_4', 'Device_4']}
devices = pd.DataFrame(devices)
devices.to_sql('devices', con)
#print(devices)


device_ids = [4,5]

device_id_tup = tuple(device_ids)

query = f"SELECT DISTINCT id FROM setups a " 
        f"WHERE device_ids IN {device_id_tup} "
        f"and not exists (SELECT * FROM setups b " 
        f"WHERE device_ids NOT IN {device_id_tup} and a.id = b.id)"

res = pd.read_sql_query(query, con)

print(list(res['id']))

And the output:

[2]

If you need an explanation of above command, let me know.

Answered By: Rishabh Kumar

I found an answer to my question that seems to work.

device_id_tup = (4,5)


query = f"SELECT DISTINCT id FROM setups a " 
        f"WHERE device_ids IN {device_id_tup} " 
        f"Group by id " 
        f"having count(*) = {len(device_id_tup)} "
        f"and not exists (SELECT * FROM setups b " 
        f"WHERE device_ids NOT IN {device_id_tup} and a.id = b.id)"
res = pd.read_sql_query(query, con)

print(list(res['id']))

The only small regret is that it doesn’t work for device_id_tup tuples with a length of 1.

Answered By: Yehla
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.