How to group items of a list based on their type transition?
Question:
My input is a list:
data = [
-1, 0,
'a','b', 1, 2, 3,
'c', 6,
'd', 'e', .4, .5,
'a', 'b', 4,
'f', 'g',
]
I’m trying to form groups (dictionary) where the keys are the strings and the values are the numbers right after them.
There are however three details I should consider:
- The list of data I receive can sometimes have leading non-string values that should be ignored
- The number of strings for each group is variable but the minimum is always 1
- Some groups can appear multiple times (example:
a/b
)
For all of that I made the code below:
start = list(map(type, data)).index(str)
wanted = {}
for i in data[start:]:
strings = []
if type(i) == str:
strings.append(i)
numbers = []
else:
numbers.append(i)
wanted['/'.join(strings)] = numbers
This gives me nearly what I’m looking for:
{'a': [], 'b': [4], '': [4], 'c': [6], 'd': [], 'e': [0.4, 0.5], 'f': [], 'g': []}
Can you show me how to fix my code?
My expected output is this:
{'a/b': [1, 2, 3, 4], 'c': [6], 'd/e': [0.4, 0.5], 'f/g': []}
Answers:
You don’t need the initial step of finding start
. Just roll that into the loop that does the rest of the work.
My suggestion would be to keep a current_key
variable that keeps track of the current key, and update it when you see a new string, and a current_values
list that keeps track of the values for that key. If the new string is just after another string (i.e. current_values
is empty), modify the current key. If not, it counts as a new key. If the element is not a string, append it to a current_values
list.
result = {}
current_key = []
current_vals = []
for ix, elem in enumerate(data):
if isinstance(elem, str):
if current_vals: # list isn't empty
key = '/'.join(current_key)
if key not in result:
result[key] = current_vals
else:
result[key].extend(current_vals)
current_key = []
current_vals = []
current_key.append(elem)
elif current_key: # only consider values if a key has already been encountered
current_vals.append(elem)
# finally, append the last key-value pair
key = '/'.join(current_key)
if key not in result:
result[key] = current_vals
else:
result[key].extend(current_vals)
Here it is 🙂 I’ve heavily commented everything
data = [
-1, 0,
'a','b', 1, 2, 3,
'c', 6,
'd', 'e', .4, .5,
'a', 'b', 4,
'f', 'g',
]
wanted = {}
i = 0 # Index of the data list
# Skipping the evenutal initial integers
while isinstance(data[i], int): # While the element is an integer
i += 1 # Increment the index
while i < len(data): # While the index is not out of range
key = '' # Initializing the key
while i < len(data) and isinstance(data[i], str): # While the element is a string (character)
if len(key) > 0: # If there is already a letter in the key, add a slash
key += '/'
key += data[i] # Add the letter to the key
i += 1 # Increment the index
if key not in wanted: # If the key is not in the dictionary, add it
wanted[key] = [] # Initializing a new list for the key
while i < len(data) and not isinstance(data[i], str): # Add all the numbers to the list
wanted[key].append(data[i]) # Add the number to the list
i += 1 # Increment the index
print(wanted)
You can use itertools.groupby
with a key function that tests if the current item is a string. To skip possible leading non-string items, fetch the first group, and fetch the next group again as a replacement if the first group items are not strings. For each group of items, if they are strings, join them as a key; otherwise extend the list under that key with the items. If the last group of items are strings, set it to an empty list as a default:
from itertools import groupby, chain
output = {}
groups = groupby(data, lambda i: isinstance(i, str))
try:
is_str, group = next(groups)
if not is_str:
is_str, group = next(groups)
except StopIteration:
pass
else:
for is_str, group in chain([(is_str, group)], groups):
if is_str:
key = '/'.join(group)
else:
output.setdefault(key, []).extend(group)
if is_str:
output.setdefault(key, [])
output
becomes:
{'a/b': [1, 2, 3, 4], 'c': [6], 'd/e': [0.4, 0.5], 'f/g': []}
A variation of blhsing’s:
from itertools import groupby, chain
output = {}
groups = groupby(data, lambda i: isinstance(i, str))
nums = []
for is_str, group in groups:
if is_str:
nums = output.setdefault('/'.join(group), [])
else:
nums += group
Or:
output = {}
groups = groupby(data, lambda i: isinstance(i, str))
nums = None
for is_str, group in groups:
if is_str:
nums = output.setdefault('/'.join(group), [])
elif nums is not None:
nums += group
My input is a list:
data = [
-1, 0,
'a','b', 1, 2, 3,
'c', 6,
'd', 'e', .4, .5,
'a', 'b', 4,
'f', 'g',
]
I’m trying to form groups (dictionary) where the keys are the strings and the values are the numbers right after them.
There are however three details I should consider:
- The list of data I receive can sometimes have leading non-string values that should be ignored
- The number of strings for each group is variable but the minimum is always 1
- Some groups can appear multiple times (example:
a/b
)
For all of that I made the code below:
start = list(map(type, data)).index(str)
wanted = {}
for i in data[start:]:
strings = []
if type(i) == str:
strings.append(i)
numbers = []
else:
numbers.append(i)
wanted['/'.join(strings)] = numbers
This gives me nearly what I’m looking for:
{'a': [], 'b': [4], '': [4], 'c': [6], 'd': [], 'e': [0.4, 0.5], 'f': [], 'g': []}
Can you show me how to fix my code?
My expected output is this:
{'a/b': [1, 2, 3, 4], 'c': [6], 'd/e': [0.4, 0.5], 'f/g': []}
You don’t need the initial step of finding start
. Just roll that into the loop that does the rest of the work.
My suggestion would be to keep a current_key
variable that keeps track of the current key, and update it when you see a new string, and a current_values
list that keeps track of the values for that key. If the new string is just after another string (i.e. current_values
is empty), modify the current key. If not, it counts as a new key. If the element is not a string, append it to a current_values
list.
result = {}
current_key = []
current_vals = []
for ix, elem in enumerate(data):
if isinstance(elem, str):
if current_vals: # list isn't empty
key = '/'.join(current_key)
if key not in result:
result[key] = current_vals
else:
result[key].extend(current_vals)
current_key = []
current_vals = []
current_key.append(elem)
elif current_key: # only consider values if a key has already been encountered
current_vals.append(elem)
# finally, append the last key-value pair
key = '/'.join(current_key)
if key not in result:
result[key] = current_vals
else:
result[key].extend(current_vals)
Here it is 🙂 I’ve heavily commented everything
data = [
-1, 0,
'a','b', 1, 2, 3,
'c', 6,
'd', 'e', .4, .5,
'a', 'b', 4,
'f', 'g',
]
wanted = {}
i = 0 # Index of the data list
# Skipping the evenutal initial integers
while isinstance(data[i], int): # While the element is an integer
i += 1 # Increment the index
while i < len(data): # While the index is not out of range
key = '' # Initializing the key
while i < len(data) and isinstance(data[i], str): # While the element is a string (character)
if len(key) > 0: # If there is already a letter in the key, add a slash
key += '/'
key += data[i] # Add the letter to the key
i += 1 # Increment the index
if key not in wanted: # If the key is not in the dictionary, add it
wanted[key] = [] # Initializing a new list for the key
while i < len(data) and not isinstance(data[i], str): # Add all the numbers to the list
wanted[key].append(data[i]) # Add the number to the list
i += 1 # Increment the index
print(wanted)
You can use itertools.groupby
with a key function that tests if the current item is a string. To skip possible leading non-string items, fetch the first group, and fetch the next group again as a replacement if the first group items are not strings. For each group of items, if they are strings, join them as a key; otherwise extend the list under that key with the items. If the last group of items are strings, set it to an empty list as a default:
from itertools import groupby, chain
output = {}
groups = groupby(data, lambda i: isinstance(i, str))
try:
is_str, group = next(groups)
if not is_str:
is_str, group = next(groups)
except StopIteration:
pass
else:
for is_str, group in chain([(is_str, group)], groups):
if is_str:
key = '/'.join(group)
else:
output.setdefault(key, []).extend(group)
if is_str:
output.setdefault(key, [])
output
becomes:
{'a/b': [1, 2, 3, 4], 'c': [6], 'd/e': [0.4, 0.5], 'f/g': []}
A variation of blhsing’s:
from itertools import groupby, chain
output = {}
groups = groupby(data, lambda i: isinstance(i, str))
nums = []
for is_str, group in groups:
if is_str:
nums = output.setdefault('/'.join(group), [])
else:
nums += group
Or:
output = {}
groups = groupby(data, lambda i: isinstance(i, str))
nums = None
for is_str, group in groups:
if is_str:
nums = output.setdefault('/'.join(group), [])
elif nums is not None:
nums += group