Grouping Python dictionaries in hierarchical form with multiple keys?

Question:

Here is my list of dicts:

[{'subtopic': 'kuku',
  'topic': 'lulu',
  'attachments': ['ttt'],
  'text': 'abc'},
 {'subtopic': 'tutu',
  'topic': 'lulu',
  'attachments': ['pipu'],
  'text': 'bubb'},
 {'subtopic': 'did',
  'topic': 'lulu',
  'attachments': ['ktop'],
  'text': 'gfg'},
 {'subtopic': 'polo',
  'topic': 'lulu',
  'attachments': ['vuvu'],
  'text': 'prolo'},
 {'subtopic': 'ssd',
  'topic': 'lulu',
  'attachments': ['jkjk'],
  'text': 'vint'},
 {'subtopic': 'plp',
  'topic': 'lulu',
  'attachments': ['fre'],
  'text': 'viw'},
 {'subtopic': 'prw',
  'topic': 'kll',
  'attachments': [],
  'text': 'kkk'},
 {'subtopic': 'prw',
  'topic': 'kll',
  'attachments': [],
  'text': 'fgfger2'}]

I would like to group by topic and subtopic to get a final result:

{
    "lulu": {
        "kuku": {
            'attachments': ['sample'],
            'text': ['sample']
            },
        "pupu": {
            'attachments': ['sample'],
            'text': ['sample']
            },
        "buru": {
            'attachments': ['sample1',
                            'sample2'],
            'text': ['sample1', 
                     'sample2']
        },
        "titi": {
            'attachments': ['sample'],
            'text': ['sample']
        },
        "huhu": {
            'attachments': ['sample'],
            'text': ['sample']
        }
        
    },
    "viriri": {
        "vururur": {
            'attachments': [],
            'text': ['sample']
        }
    }
} 

I am using:

groups = ['topic', 'subtopic', "text", "attachments"]
groups.reverse()

def hierachical_data(data, groups):
    g = groups[-1]
    g_list = []
    for key, items in itertools.groupby(data, operator.itemgetter(g)):
        g_list.append({key:list(items)})
    groups = groups[0:-1]
    if(len(groups) != 0):
        for e in g_list:
            for k, v in e.items():
                e[k] = hierachical_data(v, groups)
    return g_list

print(hierachical_data(filtered_top_facts_dicts, groups))

But getting an error for hashing lists.
Please advise how to transform my json to the desired format.

Asked By: SteveS

||

Answers:

To group the list of dictionaries by topic and subtopic, you can create an empty dictionary and then loop through the list of dictionaries to add each item to the appropriate nested level in the dictionary.

result = {}

for item in data:
    topic = item['topic']
    subtopic = item['subtopic']

    if topic not in result:
        result[topic] = {}

    if subtopic not in result[topic]:
        result[topic][subtopic] = {}
        result[topic][subtopic]['attachments'] = []
        result[topic][subtopic]['text'] = []

    result[topic][subtopic]['attachments'].extend(item['attachments'])
    result[topic][subtopic]['text'].append(item['text'])

# Reverse the order of the sub-dictionaries within each topic
for topic, subtopics in result.items():
    result[topic] = dict(reversed(list(subtopics.items())))

After this loop has completed, the result dictionary will be in the format you described, with topic and subtopic as the keys and the attachments and text as the values within each sub-dictionary.

Output:

{'AWS': {'GitHub': {'attachments': ['{"workflow.name": "view_pull_request","workflow.parameters": {"region": "us-west"}}'],
   'text': ['Sure, I can help with GitHub pull requests']},
  'S3': {'attachments': ['{"workflow.name": "aws_s3_file_copy","workflow.parameters": {"region": "us-west"}}'],
   'text': ['Sure, I can help you with the process of copying on S3']},
  'EC2': {'attachments': ['{"workflow.name": "aws_ec2_create_instance","workflow.parameters": {"region": "us-east"}}',
    '{"workflow.name": "aws_ec2_security_group_info","workflow.parameters": {"region": "us-east"}}'],
   'text': ['Sure, I can help creating an EC2 machine',
    'Sure, I can help with various information about AWS security groups']},
  'ECS': {'attachments': ['{"workflow.name": "aws_ecs_restart_service","workflow.parameters": {"region": "us-east"}}'],
   'text': ['Sure! I can help with restarting AWS ECS Service']},
  'IAM': {'attachments': ['{"workflow.name": "aws_iam_policies_info","workflow.parameters": {"region": "us-east"}}'],
   'text': ['Sure! I can help with AWS IAM policies info']}},
 'Topic Title': {'Subtopic Title': {'attachments': [],
   'text': ['This is another fact', 'This is a fact']}}}
Answered By: RJ Adriaansen

I think the cleanest solution is to use dictlib with reduce in one line:

from functools import reduce
import dictlib

reduce(
    lambda x, y: dictlib.union_setadd(x, y),
    [
        {
            x["topic"]: {
                x["subtopic"]: {
                    list(x.keys())[2]: list(x.values())[2],
                    list(x.keys())[3]: [list(x.values())[3]],
                }
            }
        }
        for x in d
    ],
)

where d is your initial list and dictlib.union_setadd() merges dictionaries by doing setadd logic like with str and int. Note that when put in reduce, merge is sequential and cumulative for all your list entries.

Hope this helps.

Answered By: Vitali Avagyan