Convert 1D list into dictionary
Question:
I have a list with categories followed by some elements. Given that I know all the category names, is there a way to turn this into a dictionary of lists, i.e. convert:
l1 = ['cat1', 'a', 'b', 'c', 'cat2', 1, 2, 3, 'cat3', 4, 5, 6, 7, 8]
into:
l1_dic = {'cat1': ['a', 'b', 'c'], 'cat2': [1, 2, 3], 'cat3': [4, 5, 6, 7, 8]}
Edit: It is possible that the categories do NOT have a common string e.g. ‘cat1’ could be replaced by ‘Name’ while ‘cat2’ could be ‘Address’.
Like I said, in my original post, we do know the category names i.e. we do potentially have a list l2 such that:
l2 = ['cat1', 'cat2', 'cat3']
Once again, the category names need not necessarily have a common string.
Answers:
You can do this,
d = {}
keys = ['cat1', 'cat2', 'cat3']
for i in l1:
if i in keys:
key = i
d.setdefault(i, [])
else:
d[key].append(i)
# Output
{'cat1': ['a', 'b', 'c'], 'cat2': [1, 2, 3], 'cat3': [4, 5, 6, 7, 8]}
You can iterate through the l1
and assign a value to the dictionary that a specific keyword exists in keys
.
Edit:
There has to be some condition to distinguish between key and value you can replace the corresponding condition with this if 'cat' in str(i)
For ex:
values = {'address_1', 'location_1', 'name_1'}
...
if i in values:
..
This can be done most efficiently with a while
and RegEx loop. I am assuming the key would be the same pattern.
import re
from collections import defaultdict
#l1 is your list
pat = r"pattern_string"
i = 0
output = defaultdict(list)
while i < len(l1):
if re.match(pat,l1[i]):
key = l1[i]
i += 1
while not re.match(pat, l1[i]) and i < len(l1):
output[key].append(l1[i])
i += 1
As you know the categories, a simple loop with tracking of the last key should work:
categories = {'cat1', 'cat2', 'cat3'}
out = {}
key = None
for item in l1:
if item in categories:
out[item] = []
key = item
else:
out[key].append(item)
output:
{'cat1': ['a', 'b', 'c'],
'cat2': [1, 2, 3],
'cat3': [4, 5, 6, 7, 8]}
Just for fun, a functional approach to this using functools.reduce
.
from functools import reduce
categories = {'cat1', 'cat2', 'cat3'}
reduce(lambda acc, x: (x, {x: [], **acc[1]}) if x in categories else
(k:=acc[0], {**(d:=acc[1]), k: d[k] + [x]}),
l1, (None, dict()))[1]
# {'cat3': [4, 5, 6, 7, 8], 'cat2': [1, 2, 3], 'cat1': ['a', 'b', 'c']}
We need a tuple to track two pieces of information as we iterate: The last "key" and a dictionary storing the parsed data so far. If the current item is a key we update the current key info in the tuple dictionary with the new key, and we add an empty list to the dictionary using the new key.
If the current else otherwise is not a key, we obviously don’t need to change the first element in the tuple, but we do update the dictionary with the updated list for that key.
Not as much of an efficient solution, but in a comment I saw you wanted a one-liner solution.
Here I have a two-liner:
l1 = ['cat1', 'a', 'b', 'c', 'cat2', 1,2,3, 'cat3',4,5,6,7,8]
l2 = ['cat1','cat2','cat3']
dct = { l2[i] : l1[l1.index(l2[i]) + 1:l1.index(l2[i+1])] for i in range(len(l2) - 1) }
dct[l2[-1]] = l1[l1.index(l2[-1]):]
print(dct)
Output:
{'cat1': ['a', 'b', 'c'], 'cat2': [1, 2, 3], 'cat3': ['cat3', 4, 5, 6, 7, 8]}
Basically, this code goes through every element in l2
, initializes it as a key of dct
, and then finds the sublist of l1
between every key and makes that the corresponding list.
I hope this helps! Please let me know if you have any further questions/clarifications 🙂
itertools.groupby
gives us an elegant way to parse the list into the keys and the subsequent values into chunks, which we can then iterate over to create the desired result:
from itertools import groupby
def make_dict(data, key_names):
result = {}
for is_key, elements in groupby(data, lambda d: d in key_names):
if is_key:
for key in elements:
result[key] = []
else:
result[key] = list(elements)
return result
Let’s test it:
>>> make_dict(['cat1', 'a', 'b', 'c', 'cat2', 1, 2, 3, 'cat3', 4, 5, 6, 7, 8],
... ['cat1', 'cat2', 'cat3'])
{'cat1': ['a', 'b', 'c'], 'cat2': [1, 2, 3], 'cat3': [4, 5, 6, 7, 8]}
>>> make_dict(['a', 'b', 'c', 'd'], ['a', 'b', 'c', 'd'])
{'a': [], 'b': [], 'c': [], 'd': []}
>>> make_dict(['a', 'b', 'c', 'd'], ['a', 'b', 'c'])
{'a': [], 'b': [], 'c': ['d']}
>>> make_dict(['a', 'b', 'c', 'd'], ['a', 'c', 'd'])
{'a': ['b'], 'c': [], 'd': []}
>>> make_dict(['a', 'b', 'c', 'd'], ['a', 'b'])
{'a': [], 'b': ['c', 'd']}
Each of the elements
chunks created by groupby
is either a sequence of keys or a sequence of values (is_key
becomes the result from the lambda
, so that tells us which kind of chunk we have). Iterating with l1_dic[key] = []
covers the case where there are consecutive keys in the data – since there are no intervening values, the keys in that group except for the last must have an empty list of values. When a group of values is found, it is assigned to the most recent key – exploiting the fact that for
loops don’t create a scope for the iteration variable.
I have a list with categories followed by some elements. Given that I know all the category names, is there a way to turn this into a dictionary of lists, i.e. convert:
l1 = ['cat1', 'a', 'b', 'c', 'cat2', 1, 2, 3, 'cat3', 4, 5, 6, 7, 8]
into:
l1_dic = {'cat1': ['a', 'b', 'c'], 'cat2': [1, 2, 3], 'cat3': [4, 5, 6, 7, 8]}
Edit: It is possible that the categories do NOT have a common string e.g. ‘cat1’ could be replaced by ‘Name’ while ‘cat2’ could be ‘Address’.
Like I said, in my original post, we do know the category names i.e. we do potentially have a list l2 such that:
l2 = ['cat1', 'cat2', 'cat3']
Once again, the category names need not necessarily have a common string.
You can do this,
d = {}
keys = ['cat1', 'cat2', 'cat3']
for i in l1:
if i in keys:
key = i
d.setdefault(i, [])
else:
d[key].append(i)
# Output
{'cat1': ['a', 'b', 'c'], 'cat2': [1, 2, 3], 'cat3': [4, 5, 6, 7, 8]}
You can iterate through the l1
and assign a value to the dictionary that a specific keyword exists in keys
.
Edit:
There has to be some condition to distinguish between key and value you can replace the corresponding condition with this if 'cat' in str(i)
For ex:
values = {'address_1', 'location_1', 'name_1'}
...
if i in values:
..
This can be done most efficiently with a while
and RegEx loop. I am assuming the key would be the same pattern.
import re
from collections import defaultdict
#l1 is your list
pat = r"pattern_string"
i = 0
output = defaultdict(list)
while i < len(l1):
if re.match(pat,l1[i]):
key = l1[i]
i += 1
while not re.match(pat, l1[i]) and i < len(l1):
output[key].append(l1[i])
i += 1
As you know the categories, a simple loop with tracking of the last key should work:
categories = {'cat1', 'cat2', 'cat3'}
out = {}
key = None
for item in l1:
if item in categories:
out[item] = []
key = item
else:
out[key].append(item)
output:
{'cat1': ['a', 'b', 'c'],
'cat2': [1, 2, 3],
'cat3': [4, 5, 6, 7, 8]}
Just for fun, a functional approach to this using functools.reduce
.
from functools import reduce
categories = {'cat1', 'cat2', 'cat3'}
reduce(lambda acc, x: (x, {x: [], **acc[1]}) if x in categories else
(k:=acc[0], {**(d:=acc[1]), k: d[k] + [x]}),
l1, (None, dict()))[1]
# {'cat3': [4, 5, 6, 7, 8], 'cat2': [1, 2, 3], 'cat1': ['a', 'b', 'c']}
We need a tuple to track two pieces of information as we iterate: The last "key" and a dictionary storing the parsed data so far. If the current item is a key we update the current key info in the tuple dictionary with the new key, and we add an empty list to the dictionary using the new key.
If the current else otherwise is not a key, we obviously don’t need to change the first element in the tuple, but we do update the dictionary with the updated list for that key.
Not as much of an efficient solution, but in a comment I saw you wanted a one-liner solution.
Here I have a two-liner:
l1 = ['cat1', 'a', 'b', 'c', 'cat2', 1,2,3, 'cat3',4,5,6,7,8]
l2 = ['cat1','cat2','cat3']
dct = { l2[i] : l1[l1.index(l2[i]) + 1:l1.index(l2[i+1])] for i in range(len(l2) - 1) }
dct[l2[-1]] = l1[l1.index(l2[-1]):]
print(dct)
Output:
{'cat1': ['a', 'b', 'c'], 'cat2': [1, 2, 3], 'cat3': ['cat3', 4, 5, 6, 7, 8]}
Basically, this code goes through every element in l2
, initializes it as a key of dct
, and then finds the sublist of l1
between every key and makes that the corresponding list.
I hope this helps! Please let me know if you have any further questions/clarifications 🙂
itertools.groupby
gives us an elegant way to parse the list into the keys and the subsequent values into chunks, which we can then iterate over to create the desired result:
from itertools import groupby
def make_dict(data, key_names):
result = {}
for is_key, elements in groupby(data, lambda d: d in key_names):
if is_key:
for key in elements:
result[key] = []
else:
result[key] = list(elements)
return result
Let’s test it:
>>> make_dict(['cat1', 'a', 'b', 'c', 'cat2', 1, 2, 3, 'cat3', 4, 5, 6, 7, 8],
... ['cat1', 'cat2', 'cat3'])
{'cat1': ['a', 'b', 'c'], 'cat2': [1, 2, 3], 'cat3': [4, 5, 6, 7, 8]}
>>> make_dict(['a', 'b', 'c', 'd'], ['a', 'b', 'c', 'd'])
{'a': [], 'b': [], 'c': [], 'd': []}
>>> make_dict(['a', 'b', 'c', 'd'], ['a', 'b', 'c'])
{'a': [], 'b': [], 'c': ['d']}
>>> make_dict(['a', 'b', 'c', 'd'], ['a', 'c', 'd'])
{'a': ['b'], 'c': [], 'd': []}
>>> make_dict(['a', 'b', 'c', 'd'], ['a', 'b'])
{'a': [], 'b': ['c', 'd']}
Each of the elements
chunks created by groupby
is either a sequence of keys or a sequence of values (is_key
becomes the result from the lambda
, so that tells us which kind of chunk we have). Iterating with l1_dic[key] = []
covers the case where there are consecutive keys in the data – since there are no intervening values, the keys in that group except for the last must have an empty list of values. When a group of values is found, it is assigned to the most recent key – exploiting the fact that for
loops don’t create a scope for the iteration variable.