Convert list into dict of prefix with different delimiters

Question:

I am trying to convert a list of items that have three unique prefixes (e.g. apple_, banana_, water_melon_)

The initial list looks like this
table_ids = ["apple_1", "apple_2", "apple_3", "banana_1", "banana_2", "banana_3", "water_melon_1", "water_melon_2", "water_melon_3"]

My desired outcome would look like this:
{"apple": ["_1", "_2", "_3"], "banana": ["_1", "_2", "_3"], "water_melon": ["_1", "_2", "_3"]}

I’ve tried this

prefixes = ["apple_", "banana_", "water_melon_"]

res =[[id for id in table_ids if(id.startswith(prefix))] for prefix in prefixes]

However, this creates a list of list grouped by prefixes.

Asked By: EMA

||

Answers:

You can use str.rsplit and collections.defaultdict.

from collections import defaultdict
res = defaultdict(list)
for t in table_ids:
    res[t.rsplit('_', 1)[0]].append('_' + t.rsplit('_', 1)[1])
print(res)

Output:

defaultdict(<class 'list'>, {'apple': ['_1', '_2', '_3'], 'banana': ['_1', '_2', '_3'], 'water_melon': ['_1', '_2', '_3']})
Answered By: I'mahdi

You can’t do this with a list comprehension because you’re trying to create a dict (not a list), and you can’t do it with a dict comprehension efficiently because you can’t determine which entries go in each sublist without iterating over the original list in its entirety.

Here’s an example of how to do it by iterating over the list and appending to entries in a dictionary:

>>> table_ids = ["apple_1", "apple_2", "apple_3", "banana_1", "banana_2", "banana_3", "water_melon_1", "water_melon_2", "water_melon_3"]
>>> tables = {}
>>> for x in table_ids:
...     t, _, i = x.rpartition("_")
...     tables.setdefault(t, []).append("_" + i)
...
>>> tables
{'apple': ['_1', '_2', '_3'], 'banana': ['_1', '_2', '_3'], 'water_melon': ['_1', '_2', '_3']}

If you really wanted to do it in a nested dict/list comprehension, that’d look like:

>>> {t: ["_" + x.rpartition("_")[2] for x in table_ids if x.startswith(t)] for t in {x.rpartition("_")[0] for x in table_ids}}
{'apple': ['_1', '_2', '_3'], 'banana': ['_1', '_2', '_3'], 'water_melon': ['_1', '_2', '_3']}

Note that the list comprehensions inside the dict comprehension make this O(N^2) whereas the first version is O(N).

Answered By: Samwise
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.