Using dictionary comprehension to create a dictionary from list of dictionaries

Question:

This is my original code, it works as I need:

import collections
import json
import yaml

file_list = [
  {'path': '/path/to/file1', 'size': 100, 'time': '2022-02-15'},
  {'path': '/path/to/file2', 'size': 200, 'time': '2022-02-13'},
  {'path': '/path/to/file3', 'size': 300, 'time': '2022-02-12'},
  {'path': '/path/to/file4', 'size': 200, 'time': '2022-02-11'},
  {'path': '/path/to/file5', 'size': 100, 'time': '2022-02-1-'}]

new_dict = collections.defaultdict(list)

for file in file_list:
    new_dict[file['size']].append(file['path'])

print(json.dumps(new_dict, indent=4, sort_keys=True))

I have found that using collections.defaultdict(list) helps to simplify the loop code so I do not need to check if a key already exists before appending to its list.

EDIT:

Is it possible to make this code concise by using dictionary comprehension to create the new_dict? The collections.defaultdict(list) is catching me out.

Asked By: quantum231

||

Answers:

If, for some reason, you want to avoid the import of defaultdict, this is one way to do it:

import json

file_list = [
    {'path': '/path/to/file1', 'size': 100, 'time': '2022-02-15'},
    {'path': '/path/to/file2', 'size': 200, 'time': '2022-02-13'},
    {'path': '/path/to/file3', 'size': 300, 'time': '2022-02-12'},
    {'path': '/path/to/file4', 'size': 200, 'time': '2022-02-11'},
    {'path': '/path/to/file5', 'size': 100, 'time': '2022-02-1-'}]

new_dict = {}

for file in file_list:
    new_dict[file['size']] = (files := new_dict.get(file['size'], []))
    files.append(file['path'])

print(json.dumps(new_dict, indent=4, sort_keys=True))

And if you don’t like the constant reassignment of that list to the dicrionary entry (I don’t really):

for file in file_list:
    if file['size'] not in new_dict:
        new_dict[file['size']] = []
    new_dict[file['size']].append(file['path'])

(Note that the default indent depth for Python is 4, you would do others and your future self a favour by adopting that sooner rather than later)

As indicated in the comments, a perhaps more elegant solution:

for file in file_list:
    new_dict.setdefault(file['size'], []).append(file['path'])

However, although it’s possible to come up with a single line comprehension, it won’t be more efficient, faster, or readable. In fact, likely none of those – so what would be the point?

Shorter code is often better code if it doesn’t compromise on function or readability, but should never be a goal by itself.

For example, consider this (bad) example:

c_dict = {size: [fp['path'] for fp in file_list if fp['size'] == size] for size in set(fs['size'] for fs in file_list)}

And although that’s a single line, you’d probably want to write it like this for readability, at which point you just have more code than before:

c_dict = {
    size: [fp['path'] for fp in file_list if fp['size'] == size] 
    for size in set(fs['size'] for fs in file_list)
}
Answered By: Grismar

Can’t claim it’s more "concise", but it’s a dict comprehension in the end 😀

from itertools import groupby
from operator import itemgetter
sort_list = sorted(file_list, key=itemgetter('size'))
groups = groupby(sort_list, key=itemgetter('size'))
print({k:[i['path'] for i in g] for k, g in groups})
Answered By: Kurt

Just make sure your data is cohesive.

file_list = [
    {'path': '/path/to/file1', 'size': 100, 'time': '2022-02-15'},
    {'path': '/path/to/file2', 'size': 200, 'time': '2022-02-13'},
    {'path': '/path/to/file3', 'size': 300, 'time': '2022-02-12'},
    {'path': '/path/to/file4', 'size': 200, 'time': '2022-02-11'},
    {'path': '/path/to/file5', 'size': 100, 'time': '2022-02-1-'}]

new_dict = {key:  [file_dict[key] for file_dict in file_list] for key in file_list[0].keys()}

For real concise, just use a DataFrame???

new_dict = pd.DataFrame(file_list).to_dict(orient='list')
Answered By: georgwalker45