How to group items of a list based on their type transition?

Question:

My input is a list:

data = [
    -1, 0,
    'a','b', 1, 2, 3,
    'c', 6,
    'd', 'e', .4, .5,
    'a', 'b', 4,
    'f', 'g',
]

I’m trying to form groups (dictionary) where the keys are the strings and the values are the numbers right after them.

There are however three details I should consider:

  • The list of data I receive can sometimes have leading non-string values that should be ignored
  • The number of strings for each group is variable but the minimum is always 1
  • Some groups can appear multiple times (example: a/b)

For all of that I made the code below:

start = list(map(type, data)).index(str)

wanted = {}
for i in data[start:]:
    strings = []
    if type(i) == str:
        strings.append(i)
        numbers = []
    else:
        numbers.append(i)
        
    wanted['/'.join(strings)] = numbers

This gives me nearly what I’m looking for:

{'a': [], 'b': [4], '': [4], 'c': [6], 'd': [], 'e': [0.4, 0.5], 'f': [], 'g': []}

Can you show me how to fix my code?

My expected output is this:

{'a/b': [1, 2, 3, 4], 'c': [6], 'd/e': [0.4, 0.5], 'f/g': []}
Asked By: VERBOSE

||

Answers:

You don’t need the initial step of finding start. Just roll that into the loop that does the rest of the work.

My suggestion would be to keep a current_key variable that keeps track of the current key, and update it when you see a new string, and a current_values list that keeps track of the values for that key. If the new string is just after another string (i.e. current_values is empty), modify the current key. If not, it counts as a new key. If the element is not a string, append it to a current_values list.

result = {}
current_key = []
current_vals = []
for ix, elem in enumerate(data):
    if isinstance(elem, str):
        if current_vals:    # list isn't empty
            key = '/'.join(current_key)
            if key not in result:
                result[key] = current_vals
            else:
                result[key].extend(current_vals)
            current_key = []
            current_vals = []
        
        current_key.append(elem)

    elif current_key:   # only consider values if a key has already been encountered
        current_vals.append(elem)

# finally, append the last key-value pair
key = '/'.join(current_key)
if key not in result:
    result[key] = current_vals
else:
    result[key].extend(current_vals)

Try it online

Answered By: pho

Here it is 🙂 I’ve heavily commented everything

data = [
    -1, 0,
    'a','b', 1, 2, 3,
    'c', 6,
    'd', 'e', .4, .5,
    'a', 'b', 4,
    'f', 'g',
]
wanted = {}

i = 0 # Index of the data list

# Skipping the evenutal initial integers
while isinstance(data[i], int): # While the element is an integer
    i += 1 # Increment the index

while i < len(data): # While the index is not out of range
    key = '' # Initializing the key

    while i < len(data) and isinstance(data[i], str): # While the element is a string (character)
        if len(key) > 0: # If there is already a letter in the key, add a slash
            key += '/'
        
        key += data[i] # Add the letter to the key
        i += 1 # Increment the index
    
    if key not in wanted: # If the key is not in the dictionary, add it
        wanted[key] = [] # Initializing a new list for the key
    
    while i < len(data) and not isinstance(data[i], str): # Add all the numbers to the list
        wanted[key].append(data[i]) # Add the number to the list
        i += 1 # Increment the index

print(wanted)
Answered By: Syrus

You can use itertools.groupby with a key function that tests if the current item is a string. To skip possible leading non-string items, fetch the first group, and fetch the next group again as a replacement if the first group items are not strings. For each group of items, if they are strings, join them as a key; otherwise extend the list under that key with the items. If the last group of items are strings, set it to an empty list as a default:

from itertools import groupby, chain

output = {}
groups = groupby(data, lambda i: isinstance(i, str))
try:
    is_str, group = next(groups)
    if not is_str:
        is_str, group = next(groups)
except StopIteration:
    pass
else:
    for is_str, group in chain([(is_str, group)], groups):
        if is_str:
            key = '/'.join(group)
        else:
            output.setdefault(key, []).extend(group)
    if is_str:
        output.setdefault(key, [])

output becomes:

{'a/b': [1, 2, 3, 4], 'c': [6], 'd/e': [0.4, 0.5], 'f/g': []}

Demo: https://ideone.com/rcOSzV

Answered By: blhsing

A variation of blhsing’s:

from itertools import groupby, chain

output = {}
groups = groupby(data, lambda i: isinstance(i, str))
nums = []
for is_str, group in groups:
    if is_str:
        nums = output.setdefault('/'.join(group), [])
    else:
        nums += group

Attempt This Online!

Or:

output = {}
groups = groupby(data, lambda i: isinstance(i, str))
nums = None
for is_str, group in groups:
    if is_str:
        nums = output.setdefault('/'.join(group), [])
    elif nums is not None:
        nums += group
Answered By: Kelly Bundy
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.