Python: How to group a list of objects by their characteristics or attributes?
Question:
I want to separate a list of objects into sublists, where objects with same attribute/characteristic stay in the same sublist.
Suppose we have a list of strings:
["This", "is", "a", "sentence", "of", "seven", "words"]
We want to separate the strings based on their length as follows:
[['sentence'], ['a'], ['is', 'of'], ['This'], ['seven', 'words']]
The program I currently come up with is this
sentence = ["This", "is", "a", "sentence", "of", "seven", "words"]
word_len_dict = {}
for word in sentence:
if len(word) not in word_len_dict.keys():
word_len_dict[len(word)] = [word]
else:
word_len_dict[len(word)].append(word)
print word_len_dict.values()
I want to know if there is a better way to achieve this?
Answers:
With defaultdict(list)
, you can omit the key-existence check:
from collections import defaultdict
word_len_dict = defaultdict(list)
for word in sentence:
word_len_dict[len(word)].append(word)
Now i am not saying this is better in any way unless you consider compact code better. Your version (which is very ok imo) is much more readable and maintainable.
list_ = ["This", "is", "a", "sentence", "of", "seven", "words"]
# for python 2 filter returns() a list
result = filter(None,[[x for x in list_ if len(x) == i] for i in range(len(max(list_, key=lambda y: len(y)))+1)])
# for python 3 filter() returns an iterator
result = list(filter(None,[[x for x in list_ if len(x) == i] for i in range(len(max(list_, key=lambda y: len(y)))+1)]))
Take a look at itertools.groupby()
. Note your list must be sorted first (more expensive than your method OP).
>>> from itertools import groupby
>>> l = ["This", "is", "a", "sentence", "of", "seven", "words"]
>>> print [list(g[1]) for g in groupby(sorted(l, key=len), len)]
[['a'], ['is', 'of'], ['This'], ['seven', 'words'], ['sentence']]
or if you want a dictionary ->
>>> {k:list(g) for k, g in groupby(sorted(l, key=len), len)}
{8: ['sentence'], 1: ['a'], 2: ['is', 'of'], 4: ['This'], 5: ['seven', 'words']}
The doc of itertools.groupby
has a example that matches exactly what you want.
keyfunc = lambda x: len(x)
data = ["This", "is", "a", "sentence", "of", "seven", "words"]
data = sorted(data, key=keyfunc)
groups = []
for k, g in groupby(data, keyfunc):
groups.append(list(g))
print groups
sentence = ["This", "is", "a", "sentence", "of", "seven", "words"]
getLength = sorted(list(set([len(data) for data in sentence])))
result = []
for length in getLength:
result.append([data for data in sentence if length == len(data)])
print(result)
You can do this with the dict only by using setdefault
function:
sentence = ["This", "is", "a", "sentence", "of", "seven", "words"]
word_len_dict = {}
for word in sentence:
word_len_dict.setdefault(len(word), []).append(word)
What setdefault
does is set the key len(word)
in your dictionary if it doesn’t exist and just retrieve the value in case it does. The second argument in setdefault
is the default value you want it to store along with that key.
It’s important to notice that in case the key already exist, the default value passed in setdefault
won’t replace the old value. This ensures that each list will be created only once and after then that same list will just be retrieved by setdefault
.
If your goal is to do it in fewer lines, there is always comprehensions:
data = ["This", "is", "a", "sentence", "of", "seven", "words"]
# Get all unique length values
unique_length_vals = set([len(word) for word in data])
# Get lists of same-length words
res = [filter(lambda x: len(x) == lval, data) for lval in unique_length_vals]
It might be less clear, but useful if you just want to code something quickly.
I want to separate a list of objects into sublists, where objects with same attribute/characteristic stay in the same sublist.
Suppose we have a list of strings:
["This", "is", "a", "sentence", "of", "seven", "words"]
We want to separate the strings based on their length as follows:
[['sentence'], ['a'], ['is', 'of'], ['This'], ['seven', 'words']]
The program I currently come up with is this
sentence = ["This", "is", "a", "sentence", "of", "seven", "words"]
word_len_dict = {}
for word in sentence:
if len(word) not in word_len_dict.keys():
word_len_dict[len(word)] = [word]
else:
word_len_dict[len(word)].append(word)
print word_len_dict.values()
I want to know if there is a better way to achieve this?
With defaultdict(list)
, you can omit the key-existence check:
from collections import defaultdict
word_len_dict = defaultdict(list)
for word in sentence:
word_len_dict[len(word)].append(word)
Now i am not saying this is better in any way unless you consider compact code better. Your version (which is very ok imo) is much more readable and maintainable.
list_ = ["This", "is", "a", "sentence", "of", "seven", "words"]
# for python 2 filter returns() a list
result = filter(None,[[x for x in list_ if len(x) == i] for i in range(len(max(list_, key=lambda y: len(y)))+1)])
# for python 3 filter() returns an iterator
result = list(filter(None,[[x for x in list_ if len(x) == i] for i in range(len(max(list_, key=lambda y: len(y)))+1)]))
Take a look at itertools.groupby()
. Note your list must be sorted first (more expensive than your method OP).
>>> from itertools import groupby
>>> l = ["This", "is", "a", "sentence", "of", "seven", "words"]
>>> print [list(g[1]) for g in groupby(sorted(l, key=len), len)]
[['a'], ['is', 'of'], ['This'], ['seven', 'words'], ['sentence']]
or if you want a dictionary ->
>>> {k:list(g) for k, g in groupby(sorted(l, key=len), len)}
{8: ['sentence'], 1: ['a'], 2: ['is', 'of'], 4: ['This'], 5: ['seven', 'words']}
The doc of itertools.groupby
has a example that matches exactly what you want.
keyfunc = lambda x: len(x)
data = ["This", "is", "a", "sentence", "of", "seven", "words"]
data = sorted(data, key=keyfunc)
groups = []
for k, g in groupby(data, keyfunc):
groups.append(list(g))
print groups
sentence = ["This", "is", "a", "sentence", "of", "seven", "words"]
getLength = sorted(list(set([len(data) for data in sentence])))
result = []
for length in getLength:
result.append([data for data in sentence if length == len(data)])
print(result)
You can do this with the dict only by using setdefault
function:
sentence = ["This", "is", "a", "sentence", "of", "seven", "words"]
word_len_dict = {}
for word in sentence:
word_len_dict.setdefault(len(word), []).append(word)
What setdefault
does is set the key len(word)
in your dictionary if it doesn’t exist and just retrieve the value in case it does. The second argument in setdefault
is the default value you want it to store along with that key.
It’s important to notice that in case the key already exist, the default value passed in setdefault
won’t replace the old value. This ensures that each list will be created only once and after then that same list will just be retrieved by setdefault
.
If your goal is to do it in fewer lines, there is always comprehensions:
data = ["This", "is", "a", "sentence", "of", "seven", "words"]
# Get all unique length values
unique_length_vals = set([len(word) for word in data])
# Get lists of same-length words
res = [filter(lambda x: len(x) == lval, data) for lval in unique_length_vals]
It might be less clear, but useful if you just want to code something quickly.