Efficient and fast way to search through dict of dicts
Question:
So I have a dict of working jobs each holding a dict
{
"hacker": {"crime": "high"},
"mugger": {"crime": "high", "morals": "low"},
"office drone": {"work_drive": "high", "tolerance": "high"},
"farmer": {"work_drive": "high"},
}
And I have roughly about 21000 more unique jobs to handle
How would I go about scanning through them faster?
And is there any type of data structure that makes this faster and better to scan through? Such as a lookup table for each of the tags?
I’m using python 3.10.4
NOTE: If it helps, everything is loaded up at the start of runtime and doesn’t change during runtime at all
Here’s my current code:
test_data = {
"hacker": {"crime": "high"},
"mugger": {"crime": "high", "morals": "low"},
"shop_owner": {"crime": "high", "morals": "high"},
"office_drone": {"work_drive": "high", "tolerance": "high"},
"farmer": {"work_drive": "high"},
}
class NULL: pass
class Conditional(object):
def __init__(self, data):
self.dataset = data
def find(self, *target, **tags):
dataset = self.dataset.items()
if target:
dataset = (
(entry, data) for entry, data in dataset
if all( (t in data) for t in target)
)
if tags:
return [
entry for entry, data in dataset
if all(
(data.get(tag, NULL) == val) for tag, val in tags.items()
)
]
else:
return [data[0] for data in dataset]
jobs = Conditional(test_data)
print(jobs.find(work_drive="high"))
>>> ['office_drone', 'farmer']
print(jobs.find("crime"))
>>> ['hacker', 'mugger', 'shop_owner']
print(jobs.find("crime", "morals"))
>>> ['mugger', 'shop_owner']
print(jobs.find("crime", morals="high"))
>>> ['shop_owner']
Answers:
When looking up the first-level in the dictionary, the way to do that is either with my_dict[key]
or my_dict.get(key)
(they do the same thing). So I think you just want to do that with your target
lookup.
Then, if you want to look up which jobs include anything about one of the tags, then I think that yea making a lookup dictionary for that is reasonable. You could make a dictionary where each key maps to a list of those jobs.
The below code would be run once at the beginning and would make the lookup based off of the test_data
. It loops through the entire dictionary and any time it encounters a tag
in the values for an item, it’ll add the key from it to the list of jobs for that tag
lookup = dict()
for k,v in test_data.items():
for kk,vv in v.items():
try:
lookup[kk].append(k)
except KeyError:
lookup[kk] = [k]
Output (lookup
):
{'crime': ['hacker', 'mugger', 'shop_owner'],
'morals': ['mugger', 'shop_owner'],
'work_drive': ['office_drone', 'farmer'],
'tolerance': ['office_drone']}
With this lookup table, you could ask ‘Which jobs have a crime stat?’ with lookup['crime']
, which would output ['hacker', 'mugger', 'shop_owner']
And is there any type of data structure that makes this faster and better to scan through?
Yes. And it is called dict =)
Just turn your dict into two dictionaries one by tag and another by tag and tag value which will contain sets:
from collections import defaultdict
...
by_tag = defaultdict(set)
by_tag_value = defaultdict(lambda: defaultdict(set))
for job, tags in test_data.items():
for tag, val in tags.items():
by_tag[tag].add(job)
by_tag_value[tag][val].add(job)
# example
# to search crime:high and morals
crime_high = by_tag_value["crime"]["high"]
morals = by_tag["morals"]
result = crime_high.intersection(morals) # {'mugger', 'shop_owner'}
And then use them to search needed sets and return jobs which are present in all of the sets.
So I have a dict of working jobs each holding a dict
{
"hacker": {"crime": "high"},
"mugger": {"crime": "high", "morals": "low"},
"office drone": {"work_drive": "high", "tolerance": "high"},
"farmer": {"work_drive": "high"},
}
And I have roughly about 21000 more unique jobs to handle
How would I go about scanning through them faster?
And is there any type of data structure that makes this faster and better to scan through? Such as a lookup table for each of the tags?
I’m using python 3.10.4
NOTE: If it helps, everything is loaded up at the start of runtime and doesn’t change during runtime at all
Here’s my current code:
test_data = {
"hacker": {"crime": "high"},
"mugger": {"crime": "high", "morals": "low"},
"shop_owner": {"crime": "high", "morals": "high"},
"office_drone": {"work_drive": "high", "tolerance": "high"},
"farmer": {"work_drive": "high"},
}
class NULL: pass
class Conditional(object):
def __init__(self, data):
self.dataset = data
def find(self, *target, **tags):
dataset = self.dataset.items()
if target:
dataset = (
(entry, data) for entry, data in dataset
if all( (t in data) for t in target)
)
if tags:
return [
entry for entry, data in dataset
if all(
(data.get(tag, NULL) == val) for tag, val in tags.items()
)
]
else:
return [data[0] for data in dataset]
jobs = Conditional(test_data)
print(jobs.find(work_drive="high"))
>>> ['office_drone', 'farmer']
print(jobs.find("crime"))
>>> ['hacker', 'mugger', 'shop_owner']
print(jobs.find("crime", "morals"))
>>> ['mugger', 'shop_owner']
print(jobs.find("crime", morals="high"))
>>> ['shop_owner']
When looking up the first-level in the dictionary, the way to do that is either with my_dict[key]
or my_dict.get(key)
(they do the same thing). So I think you just want to do that with your target
lookup.
Then, if you want to look up which jobs include anything about one of the tags, then I think that yea making a lookup dictionary for that is reasonable. You could make a dictionary where each key maps to a list of those jobs.
The below code would be run once at the beginning and would make the lookup based off of the test_data
. It loops through the entire dictionary and any time it encounters a tag
in the values for an item, it’ll add the key from it to the list of jobs for that tag
lookup = dict()
for k,v in test_data.items():
for kk,vv in v.items():
try:
lookup[kk].append(k)
except KeyError:
lookup[kk] = [k]
Output (lookup
):
{'crime': ['hacker', 'mugger', 'shop_owner'],
'morals': ['mugger', 'shop_owner'],
'work_drive': ['office_drone', 'farmer'],
'tolerance': ['office_drone']}
With this lookup table, you could ask ‘Which jobs have a crime stat?’ with lookup['crime']
, which would output ['hacker', 'mugger', 'shop_owner']
And is there any type of data structure that makes this faster and better to scan through?
Yes. And it is called dict =)
Just turn your dict into two dictionaries one by tag and another by tag and tag value which will contain sets:
from collections import defaultdict
...
by_tag = defaultdict(set)
by_tag_value = defaultdict(lambda: defaultdict(set))
for job, tags in test_data.items():
for tag, val in tags.items():
by_tag[tag].add(job)
by_tag_value[tag][val].add(job)
# example
# to search crime:high and morals
crime_high = by_tag_value["crime"]["high"]
morals = by_tag["morals"]
result = crime_high.intersection(morals) # {'mugger', 'shop_owner'}
And then use them to search needed sets and return jobs which are present in all of the sets.