Using Comprehension and Data frames to extract data from lists of dictionary's in Python

Question:

I sure I should be able to find this but I have looked and I can’t seem to fine how to do a few of the user cases I am looking for. I want to search a list of dictionaries and either pull back a subset or count how often a value appears.

for example from the below list I want to be able to say

return a list of all the dictionaries that contain "WAP1" in the key "AP" or return the number of lists that key "network" = "net1"

so return a new list with just the first 2 dictionary items and the number "3", based on a logical search term

I have used, wap = next((item for item in ls_dict if item[‘AP’]==’WAP1′),’none’) but this only gets the first item. I was also not sure why this does not work without using "next" and throw’s this error. <generator object at 0x7f9146cba0>

At the end of the day i want to be able to search a large list for the occurrence of a mac address and either pull out a list of all the dictionary objects that i can use for future operations, or simply count up how many times they appear.

Thank you in advance for any guidance, i know this must be simple but have been looking for a while and cant figure it out.

> ls_dict = [{'network': 'NET1', 'AP': 'WAP1', 'MAC': 'FF01', 'ap_mac' : 'eeeeeeeeeeee'},
           {'network': 'NET1', 'AP': 'WAP1', 'MAC': 'FF02', 'ap_mac' : 'eeeeeeeeeeee'},
           {'network': 'NET1', 'AP': 'WAP2', 'MAC': 'FF03', 'ap_mac' : 'eeeeeeeeeeee'},
           {'network': 'NET2', 'AP': 'WAP3', 'MAC': 'FF04', 'ap_mac' : 'eeeeeeeeeeee'}]
Asked By: DevilWAH

||

Answers:

If the list is large you might want to use pandas in order to work on dataframes. You can always export (list of) dictionaries with to_dict:

import pandas as pd

ls_dict = [{'network': 'NET1', 'AP': 'WAP1', 'MAC': 'FF01', 'ap_mac' : 'eeeeeeeeeeee'},
           {'network': 'NET1', 'AP': 'WAP1', 'MAC': 'FF02', 'ap_mac' : 'eeeeeeeeeeee'},
           {'network': 'NET1', 'AP': 'WAP2', 'MAC': 'FF03', 'ap_mac' : 'eeeeeeeeeeee'},
           {'network': 'NET2', 'AP': 'WAP3', 'MAC': 'FF04', 'ap_mac' : 'eeeeeeeeeeee'}]

df = pd.DataFrame(ls_dict)
df[df['AP']=="WAP1"].to_dict(orient='record'))

Output:

[{'network': 'NET1', 'AP': 'WAP1', 'MAC': 'FF01', 'ap_mac': 'eeeeeeeeeeee'},
{'network': 'NET1', 'AP': 'WAP1', 'MAC': 'FF02', 'ap_mac': 'eeeeeeeeeeee'}]

Or:

len(df[df['network']=="NET1"]) # returns 3
Answered By: Tranbi

The pandas answer if perfectly fine but the poster and you both made comments about efficiency. If you start with a list of dictionaries, pandas is not helping from a performance perspective.

Let’s create some larger data based on your example:

import random

APs= [f"WAP{i}" for i in range(1, 9)]
ls_dict = [
    {'network': 'NET1', 'AP': random.choice(APs), 'MAC': 'FF01', 'ap_mac' : 'eeeeeeeeeeee'}
    for _ in range(10_000)
]

And now some test methods that we can verify return the same results:

import pandas as pd

def t1(ls_dict):
    df = pd.DataFrame(ls_dict)
    return df[df['AP']=="WAP1"].to_dict(orient='records')

def t2(ls_dict):
    return [r for r in ls_dict if r["AP"] == "WAP1"]

print(t1(ls_dict) == t2(ls_dict))

That will (should) result in True

So memory aside, let’s see how these perform using timeit:

setup = """
import pandas as pd
import random

APs= [f"WAP{i}" for i in range(1, 9)]
ls_dict = [
    {'network': 'NET1', 'AP': random.choice(APs), 'MAC': 'FF01', 'ap_mac' : 'eeeeeeeeeeee'}
    for _ in range(10_000)
]

ls_dict = [
    {'network': 'NET1', 'AP': 'WAP1', 'MAC': 'FF01', 'ap_mac' : 'eeeeeeeeeeee'},
    {'network': 'NET1', 'AP': 'WAP1', 'MAC': 'FF02', 'ap_mac' : 'eeeeeeeeeeee'},
    {'network': 'NET1', 'AP': 'WAP2', 'MAC': 'FF03', 'ap_mac' : 'eeeeeeeeeeee'},
    {'network': 'NET2', 'AP': 'WAP3', 'MAC': 'FF04', 'ap_mac' : 'eeeeeeeeeeee'}
] * 5_000

def t1(ls_dict):
    df = pd.DataFrame(ls_dict)
    return df[df['AP']=="WAP1"].to_dict(orient='records')

def t2(ls_dict):
    return [r for r in ls_dict if r["AP"] == "WAP1"]
"""

print(f'pandas dataframe: {timeit.timeit("t1(ls_dict)", setup=setup, number=100)}')
print(f'comprehension   : {timeit.timeit("t2(ls_dict)", setup=setup, number=100)}')

On my system, that results in:

pandas dataframe: 3.5198207999928854
comprehension   : 0.08080889997654594
Answered By: JonSG
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.