Need to filter list of dictionaries

Question:

I have one large list of dictionaries pulled from a database. I’m wondering if there is a way to search the first list of dicts to only show the first 3 digits (list would be unique to avoid duplicate entries) in one dropdown menu and, based on the selection, show the appropriate options?

Essentially the first dropdown would show selectable options "220, 221, 222, 223, 224", let’s say we selected "220", then the second dropdown would show ‘220_2_INTL_PRSTR_ET_619076’, ‘220_4_KAL_T2E_DOLE_344657’. I’m not worried about the code needed to create the dropdowns, just the filtering of the list of dicts to achieve the results.

list_of_dict = [ 
{'label': '220_2_INTL_PRSTR_ET_619076', 'value': '220_2_INTL_PRSTR_ET_619076'}, 
{'label': '220_4_KAL_T2E_DOLE_344657', 'value': '220_4_KAL_T2E_DOLE_344657'}, 
{'label': '221_1_PB_520_REF_174049', 'value': '221_1_PB_520_REF_174049'},
{'label': '222_5_KAL_T2E_YT_344991', 'value': '222_5_KAL_T2E_YT_344991'}, 
{'label': '223_2_PB_520_REF_174050', 'value': '223_2_PB_520_REF_174050'}, , 
{'label': '224_3_PB_520_REF_174051', 'value': '224_3_PB_520_REF_174051'}]
Asked By: Rettro

||

Answers:

As first step, you can use set() to obtain the first tree characters from labels without duplicates.

Then you can create a function to filter the list using str.startswith:

def get_options(l):
    return {d["label"][:3] for d in l}


def filter_list(l, s):
    return [d for d in l if d["label"].startswith(s)]


print(get_options(list_of_dict))
print(filter_list(list_of_dict, "220"))

Prints:

{"222", "221", "224", "220", "223"}
[
    {
        "label": "220_2_INTL_PRSTR_ET_619076",
        "value": "220_2_INTL_PRSTR_ET_619076",
    },
    {
        "label": "220_4_KAL_T2E_DOLE_344657",
        "value": "220_4_KAL_T2E_DOLE_344657",
    },
]
Answered By: Andrej Kesely

Given a prefix, the filtering code comes down to an if-statement and a str.startwith() test:

prefix = '220'
for d in list_of_dict:
    if d['label'].startswith(prefix):
        print(d)

For purposes of a dropdown menu, it may be preferable to group the data in advance using defaultdict() and extracting the prefix with slicing:

from collections import defaultdict

grouped = defaultdict(list)
for d in list_of_dict:
    prefix = d['label'][:3]
    grouped[prefix].append( d['value'])

Now you can lookup results directly:

>>> grouped['220']
['220_2_INTL_PRSTR_ET_619076', '220_4_KAL_T2E_DOLE_344657']

This strategy is also faster than filtering the entire list for every lookup.

Answered By: Raymond Hettinger

It’s fairly straightforward to turn your list into a dictionary that has exactly what you need:

list_of_dict = [
    {'label': '220_2_INTL_PRSTR_ET_619076', 'value': '220_2_INTL_PRSTR_ET_619076'},
    {'label': '220_4_KAL_T2E_DOLE_344657', 'value': '220_4_KAL_T2E_DOLE_344657'},
    {'label': '221_1_PB_520_REF_174049', 'value': '221_1_PB_520_REF_174049'},
    {'label': '222_5_KAL_T2E_YT_344991', 'value': '222_5_KAL_T2E_YT_344991'},
    {'label': '223_2_PB_520_REF_174050', 'value': '223_2_PB_520_REF_174050'},
    {'label': '224_3_PB_520_REF_174051', 'value': '224_3_PB_520_REF_174051'}
]

# it's a "one-liner", a dict comprehension split over a few lines for readability:
result = {
    p: [d for d in list_of_dict if d['label'].startswith(p)]
    for p in set(d['label'][:3] for d in list_of_dict)
}

print(result['220'])  # the contents for this prefix
print(result.keys())  # the keys for your first dropdown

Output:

[{'label': '220_2_INTL_PRSTR_ET_619076', 'value': '220_2_INTL_PRSTR_ET_619076'}, {'label': '220_4_KAL_T2E_DOLE_344657', 'value': '220_4_KAL_T2E_DOLE_344657'}]
['222', '220', '224', '223', '221']

Note that the keys are out of order, but sorting is straightforward.

Instead of d['label'][:3] you could consider d['label'].split('_')[0], if the prefixes aren’t all 3 characters long, but instead are "everything before the first underscore".

Edit: in the comments, you asked for some additional explanation of the core bit of code:

{
    p: [d for d in list_of_dict if d['label'].startswith(p)]
    for p in set(d['label'][:3] for d in list_of_dict)
}
  • Anything of the form {..: .. for .. in ..} is a dictionary comprehension, constructing a dictionary using a very efficient loop.
  • Here it’s {p: ... for p in set(d['label'][:3] for d in list_of_dict)}. So, p loops over the elements of set(d['label'][:3] for d in list_of_dict) and for every p, a key is added to the dictionary.
  • That d['label'][:3] for d in list_of_dict is a generator expression that generates the first three characters ([:3]) of every 'label' value for every dictionary d in your list_of_dict. I.e. ['220', '220', '221', '222', etc.]. And the set() around it reduces it to have only unique elements.
  • The value part of the dictionary comprehension is a list comprehension, so a list is construction as a value for each key p. A list comprehension looks like [.. for .. in ..] (with an optional if ..-part to filter the contents)
  • The comprehension [d for d in list_of_dict if d['label'].startswith(p)] takes each dictionary d from your list_of_dict, but only keeps it in the resulting list if d['label'].startswith(p) is True (i.e. only if d['label'] starts with p, which is the current 3-letter string being used as a key.

So, it gathers all of the 3-letter prefixes in a set, and then generates a dictionary with those unique prefixes as keys, and a list of all the dictionaries that have 'label' values starting with the matching 3-letter prefix, as their value.

Answered By: Grismar
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.