How to produce the summary of dictionary values through Python?

Question:

I have the below sample dictionary,

errors = [{'PartitionKey': '34', 'RowKey': '14', 'Component': 'mamba', 'Environment': 'QA', 'Error': '404 not found', 'Group': 'Test', 'Job': 'cutting', 'JobType': 'automated'}, {'PartitionKey': '35', 'RowKey': '15', 'Component': 'mamba', 'Environment': 'QA', 'Error': '404 not found', 'Group': 'Test', 'Job': 'cutting', 'JobType': 'automated'}, {'PartitionKey': '36', 'RowKey': '16', 'Component': 'mamba', 'Environment': 'Dev', 'Error': '404 not found', 'Group': 'random', 'Job': 'moping', 'JobType': 'manual'}, {'PartitionKey': '37', 'RowKey': '17', 'Component': 'mamba', 'Environment': 'QA', 'Error': '404 not found', 'Group': 'Test', 'Job': 'cutting', 'JobType': 'automated'}, {'PartitionKey': '38', 'RowKey': '18', 'Component': 'mamba', 'Environment': 'Dev', 'Error': '404 not found', 'Group': 'random', 'Job': 'moping', 'JobType': 'manual'},{'PartitionKey': '39', 'RowKey': '19', 'Component': 'Scorpio', 'Environment': 'Dev', 'Error': '500 internal error', 'Group': 'minerva', 'Job': 'cleaning', 'JobType': 'manual'},{'PartitionKey': '39', 'RowKey': '19', 'Component': 'Scorpio', 'Environment': 'Dev', 'Error': '500 internal error', 'Group': 'minerva', 'Job': 'cleaning', 'JobType': 'manual'}]

Using a python program I am trying to find for each environment, how many types of errors are observed and what is the count. Something like,

{
    'QA': {
       '404 not found': 10,
       '500 internal error': 20,
       '503 xyz': 30
    },
    'DEV': {
       '404 not found': 10,
       '500 internal error': 20,
       '503 xyz': 30    
     }  
}

I am trying to achieve this using Python itertools groupby. Here is the snippet of what I am trying, but I could not achieve exactly what I wanted. Any help will be appreciated

   from itertools import groupby
   grouped = collections.defaultdict(list)
   newgrouped = collections.defaultdict(list)
 
   for item in errors:
       grouped[item['Environment']].append(item)

   for key, vals in grouped.items():
       for val in valss:
           newgrouped[group['Error']].append(group)
Asked By: Hound

||

Answers:

You can use dict.setdefault to initialize a non-existing key with a sub-dict where error counts can be kept track of:

from operator import itemgetter

summary = {}
for env, error in map(itemgetter('Environment', 'Error'), errors):
    summary.setdefault(env, {})[error] = summary.get(env, {}).get(error, 0) + 1

Given your sample input, summary would become:

{'QA': {'404 not found': 3}, 'Dev': {'404 not found': 2, '500 internal error': 2}}

Demo: https://replit.com/@blhsing/BogusVirtualKnowledge

Answered By: blhsing

Seems like what you want is a dict(dict(int)).

group = defaultdict(dict)
for a in errors:
    if not group[a['Environment']]:
        group[a['Environment']] = defaultdict(int)
    group[a['Environment']][a['Error']]+=1
print(group)
Answered By: Kyle Chen

I am not familiar with mongodb, just try transferring it to dataframe:

errors_df = pd.DataFrame()

for dict in errors:
        errors_df = errors_df.append(dict, ignore_index=True) 

errors_env = errors_df.groupby(['Environment', 'Error']).count()

                                PartitionKey  RowKey  Component  Group  Job  
Environment Error                                                             
Dev         404 not found                  2       2          2      2    2   
            500 internal error             2       2          2      2    2   
QA          404 not found                  3       3          3      3    3   
Answered By: drunkfish69

as for pandas it could be done like this:

import pandas as pd

res = (pd.DataFrame(errors).groupby('Environment')['Error']
       .apply(lambda x: x.value_counts().items()).map(dict).to_dict())

>>> res
'''
{'Dev': {'404 not found': 2, '500 internal error': 2}, 'QA': {'404 not found': 3}}
Answered By: SergFSM