Using dictionary comprehension to create a dictionary from list of dictionaries
Question:
This is my original code, it works as I need:
import collections
import json
import yaml
file_list = [
{'path': '/path/to/file1', 'size': 100, 'time': '2022-02-15'},
{'path': '/path/to/file2', 'size': 200, 'time': '2022-02-13'},
{'path': '/path/to/file3', 'size': 300, 'time': '2022-02-12'},
{'path': '/path/to/file4', 'size': 200, 'time': '2022-02-11'},
{'path': '/path/to/file5', 'size': 100, 'time': '2022-02-1-'}]
new_dict = collections.defaultdict(list)
for file in file_list:
new_dict[file['size']].append(file['path'])
print(json.dumps(new_dict, indent=4, sort_keys=True))
I have found that using collections.defaultdict(list) helps to simplify the loop code so I do not need to check if a key already exists before appending to its list.
EDIT:
Is it possible to make this code concise by using dictionary comprehension to create the new_dict? The collections.defaultdict(list) is catching me out.
Answers:
If, for some reason, you want to avoid the import of defaultdict
, this is one way to do it:
import json
file_list = [
{'path': '/path/to/file1', 'size': 100, 'time': '2022-02-15'},
{'path': '/path/to/file2', 'size': 200, 'time': '2022-02-13'},
{'path': '/path/to/file3', 'size': 300, 'time': '2022-02-12'},
{'path': '/path/to/file4', 'size': 200, 'time': '2022-02-11'},
{'path': '/path/to/file5', 'size': 100, 'time': '2022-02-1-'}]
new_dict = {}
for file in file_list:
new_dict[file['size']] = (files := new_dict.get(file['size'], []))
files.append(file['path'])
print(json.dumps(new_dict, indent=4, sort_keys=True))
And if you don’t like the constant reassignment of that list to the dicrionary entry (I don’t really):
for file in file_list:
if file['size'] not in new_dict:
new_dict[file['size']] = []
new_dict[file['size']].append(file['path'])
(Note that the default indent depth for Python is 4, you would do others and your future self a favour by adopting that sooner rather than later)
As indicated in the comments, a perhaps more elegant solution:
for file in file_list:
new_dict.setdefault(file['size'], []).append(file['path'])
However, although it’s possible to come up with a single line comprehension, it won’t be more efficient, faster, or readable. In fact, likely none of those – so what would be the point?
Shorter code is often better code if it doesn’t compromise on function or readability, but should never be a goal by itself.
For example, consider this (bad) example:
c_dict = {size: [fp['path'] for fp in file_list if fp['size'] == size] for size in set(fs['size'] for fs in file_list)}
And although that’s a single line, you’d probably want to write it like this for readability, at which point you just have more code than before:
c_dict = {
size: [fp['path'] for fp in file_list if fp['size'] == size]
for size in set(fs['size'] for fs in file_list)
}
Can’t claim it’s more "concise", but it’s a dict comprehension in the end 😀
from itertools import groupby
from operator import itemgetter
sort_list = sorted(file_list, key=itemgetter('size'))
groups = groupby(sort_list, key=itemgetter('size'))
print({k:[i['path'] for i in g] for k, g in groups})
Just make sure your data is cohesive.
file_list = [
{'path': '/path/to/file1', 'size': 100, 'time': '2022-02-15'},
{'path': '/path/to/file2', 'size': 200, 'time': '2022-02-13'},
{'path': '/path/to/file3', 'size': 300, 'time': '2022-02-12'},
{'path': '/path/to/file4', 'size': 200, 'time': '2022-02-11'},
{'path': '/path/to/file5', 'size': 100, 'time': '2022-02-1-'}]
new_dict = {key: [file_dict[key] for file_dict in file_list] for key in file_list[0].keys()}
For real concise, just use a DataFrame???
new_dict = pd.DataFrame(file_list).to_dict(orient='list')
This is my original code, it works as I need:
import collections
import json
import yaml
file_list = [
{'path': '/path/to/file1', 'size': 100, 'time': '2022-02-15'},
{'path': '/path/to/file2', 'size': 200, 'time': '2022-02-13'},
{'path': '/path/to/file3', 'size': 300, 'time': '2022-02-12'},
{'path': '/path/to/file4', 'size': 200, 'time': '2022-02-11'},
{'path': '/path/to/file5', 'size': 100, 'time': '2022-02-1-'}]
new_dict = collections.defaultdict(list)
for file in file_list:
new_dict[file['size']].append(file['path'])
print(json.dumps(new_dict, indent=4, sort_keys=True))
I have found that using collections.defaultdict(list) helps to simplify the loop code so I do not need to check if a key already exists before appending to its list.
EDIT:
Is it possible to make this code concise by using dictionary comprehension to create the new_dict? The collections.defaultdict(list) is catching me out.
If, for some reason, you want to avoid the import of defaultdict
, this is one way to do it:
import json
file_list = [
{'path': '/path/to/file1', 'size': 100, 'time': '2022-02-15'},
{'path': '/path/to/file2', 'size': 200, 'time': '2022-02-13'},
{'path': '/path/to/file3', 'size': 300, 'time': '2022-02-12'},
{'path': '/path/to/file4', 'size': 200, 'time': '2022-02-11'},
{'path': '/path/to/file5', 'size': 100, 'time': '2022-02-1-'}]
new_dict = {}
for file in file_list:
new_dict[file['size']] = (files := new_dict.get(file['size'], []))
files.append(file['path'])
print(json.dumps(new_dict, indent=4, sort_keys=True))
And if you don’t like the constant reassignment of that list to the dicrionary entry (I don’t really):
for file in file_list:
if file['size'] not in new_dict:
new_dict[file['size']] = []
new_dict[file['size']].append(file['path'])
(Note that the default indent depth for Python is 4, you would do others and your future self a favour by adopting that sooner rather than later)
As indicated in the comments, a perhaps more elegant solution:
for file in file_list:
new_dict.setdefault(file['size'], []).append(file['path'])
However, although it’s possible to come up with a single line comprehension, it won’t be more efficient, faster, or readable. In fact, likely none of those – so what would be the point?
Shorter code is often better code if it doesn’t compromise on function or readability, but should never be a goal by itself.
For example, consider this (bad) example:
c_dict = {size: [fp['path'] for fp in file_list if fp['size'] == size] for size in set(fs['size'] for fs in file_list)}
And although that’s a single line, you’d probably want to write it like this for readability, at which point you just have more code than before:
c_dict = {
size: [fp['path'] for fp in file_list if fp['size'] == size]
for size in set(fs['size'] for fs in file_list)
}
Can’t claim it’s more "concise", but it’s a dict comprehension in the end 😀
from itertools import groupby
from operator import itemgetter
sort_list = sorted(file_list, key=itemgetter('size'))
groups = groupby(sort_list, key=itemgetter('size'))
print({k:[i['path'] for i in g] for k, g in groups})
Just make sure your data is cohesive.
file_list = [
{'path': '/path/to/file1', 'size': 100, 'time': '2022-02-15'},
{'path': '/path/to/file2', 'size': 200, 'time': '2022-02-13'},
{'path': '/path/to/file3', 'size': 300, 'time': '2022-02-12'},
{'path': '/path/to/file4', 'size': 200, 'time': '2022-02-11'},
{'path': '/path/to/file5', 'size': 100, 'time': '2022-02-1-'}]
new_dict = {key: [file_dict[key] for file_dict in file_list] for key in file_list[0].keys()}
For real concise, just use a DataFrame???
new_dict = pd.DataFrame(file_list).to_dict(orient='list')