Splitting a list into N parts of approximately equal length
Question:
What is the best way to divide a list into roughly equal parts? For example, if the list has 7 elements and is split it into 2 parts, we want to get 3 elements in one part, and the other should have 4 elements.
I’m looking for something like even_split(L, n)
that breaks L
into n
parts.
def chunks(L, n):
""" Yield successive n-sized chunks from L.
"""
for i in range(0, len(L), n):
yield L[i:i+n]
The code above gives chunks of 3, rather than 3 chunks. I could simply transpose (iterate over this and take the first element of each column, call that part one, then take the second and put it in part two, etc), but that destroys the ordering of the items.
Answers:
This code is broken due to rounding errors. Do not use it!!!
assert len(chunkIt([1,2,3], 10)) == 10 # fails
Here’s one that could work:
def chunkIt(seq, num):
avg = len(seq) / float(num)
out = []
last = 0.0
while last < len(seq):
out.append(seq[int(last):int(last + avg)])
last += avg
return out
Testing:
>>> chunkIt(range(10), 3)
[[0, 1, 2], [3, 4, 5], [6, 7, 8, 9]]
>>> chunkIt(range(11), 3)
[[0, 1, 2], [3, 4, 5, 6], [7, 8, 9, 10]]
>>> chunkIt(range(12), 3)
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
Changing the code to yield n
chunks rather than chunks of n
:
def chunks(l, n):
""" Yield n successive chunks from l.
"""
newn = int(len(l) / n)
for i in xrange(0, n-1):
yield l[i*newn:i*newn+newn]
yield l[n*newn-newn:]
l = range(56)
three_chunks = chunks (l, 3)
print three_chunks.next()
print three_chunks.next()
print three_chunks.next()
which gives:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
[18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]
[36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55]
This will assign the extra elements to the final group which is not perfect but well within your specification of “roughly N equal parts” 🙂 By that, I mean 56 elements would be better as (19,19,18) whereas this gives (18,18,20).
You can get the more balanced output with the following code:
#!/usr/bin/python
def chunks(l, n):
""" Yield n successive chunks from l.
"""
newn = int(1.0 * len(l) / n + 0.5)
for i in xrange(0, n-1):
yield l[i*newn:i*newn+newn]
yield l[n*newn-newn:]
l = range(56)
three_chunks = chunks (l, 3)
print three_chunks.next()
print three_chunks.next()
print three_chunks.next()
which outputs:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
[19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37]
[38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55]
Here is one that adds None
to make the lists equal length
>>> from itertools import izip_longest
>>> def chunks(l, n):
""" Yield n successive chunks from l. Pads extra spaces with None
"""
return list(zip(*izip_longest(*[iter(l)]*n)))
>>> l=range(54)
>>> chunks(l,3)
[(0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51), (1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43, 46, 49, 52), (2, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44, 47, 50, 53)]
>>> chunks(l,4)
[(0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52), (1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53), (2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, None), (3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, None)]
>>> chunks(l,5)
[(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50), (1, 6, 11, 16, 21, 26, 31, 36, 41, 46, 51), (2, 7, 12, 17, 22, 27, 32, 37, 42, 47, 52), (3, 8, 13, 18, 23, 28, 33, 38, 43, 48, 53), (4, 9, 14, 19, 24, 29, 34, 39, 44, 49, None)]
You can write it fairly simply as a list generator:
def split(a, n):
k, m = divmod(len(a), n)
return (a[i*k+min(i, m):(i+1)*k+min(i+1, m)] for i in range(n))
Example:
>>> list(split(range(11), 3))
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10]]
As long as you don’t want anything silly like continuous chunks:
>>> def chunkify(lst,n):
... return [lst[i::n] for i in xrange(n)]
...
>>> chunkify(range(13), 3)
[[0, 3, 6, 9, 12], [1, 4, 7, 10], [2, 5, 8, 11]]
Have a look at numpy.split:
>>> a = numpy.array([1,2,3,4])
>>> numpy.split(a, 2)
[array([1, 2]), array([3, 4])]
Another way would be something like this, the idea here is to use grouper, but get rid of None
. In this case we’ll have all ‘small_parts’ formed from elements at the first part of the list, and ‘larger_parts’ from the later part of the list. Length of ‘larger parts’ is len(small_parts) + 1. We need to consider x as two different sub-parts.
from itertools import izip_longest
import numpy as np
def grouper(n, iterable, fillvalue=None): # This is grouper from itertools
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
def another_chunk(x,num):
extra_ele = len(x)%num #gives number of parts that will have an extra element
small_part = int(np.floor(len(x)/num)) #gives number of elements in a small part
new_x = list(grouper(small_part,x[:small_part*(num-extra_ele)]))
new_x.extend(list(grouper(small_part+1,x[small_part*(num-extra_ele):])))
return new_x
The way I have it set up returns a list of tuples:
>>> x = range(14)
>>> another_chunk(x,3)
[(0, 1, 2, 3), (4, 5, 6, 7, 8), (9, 10, 11, 12, 13)]
>>> another_chunk(x,4)
[(0, 1, 2), (3, 4, 5), (6, 7, 8, 9), (10, 11, 12, 13)]
>>> another_chunk(x,5)
[(0, 1), (2, 3, 4), (5, 6, 7), (8, 9, 10), (11, 12, 13)]
>>>
Here’s another variant that spreads the “remaining” elements evenly among all the chunks, one at a time until there are none left. In this implementation, the larger chunks occur at the beginning the process.
def chunks(l, k):
""" Yield k successive chunks from l."""
if k < 1:
yield []
raise StopIteration
n = len(l)
avg = n/k
remainders = n % k
start, end = 0, avg
while start < n:
if remainders > 0:
end = end + 1
remainders = remainders - 1
yield l[start:end]
start, end = end, end+avg
For example, generate 4 chunks from a list of 14 elements:
>>> list(chunks(range(14), 4))
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10], [11, 12, 13]]
>>> map(len, list(chunks(range(14), 4)))
[4, 4, 3, 3]
The same as job’s answer, but takes into account lists with size smaller than the number of chuncks.
def chunkify(lst,n):
[ lst[i::n] for i in xrange(n if n < len(lst) else len(lst)) ]
if n (number of chunks) is 7 and lst (the list to divide) is [1, 2, 3] the chunks are [[0], [1], [2]] instead of [[0], [1], [2], [], [], [], []]
Here is my solution:
def chunks(l, amount):
if amount < 1:
raise ValueError('amount must be positive integer')
chunk_len = len(l) // amount
leap_parts = len(l) % amount
remainder = amount // 2 # make it symmetrical
i = 0
while i < len(l):
remainder += leap_parts
end_index = i + chunk_len
if remainder >= amount:
remainder -= amount
end_index += 1
yield l[i:end_index]
i = end_index
Produces
>>> list(chunks([1, 2, 3, 4, 5, 6, 7], 3))
[[1, 2], [3, 4, 5], [6, 7]]
You could also use:
split=lambda x,n: x if not x else [x[:n]]+[split([] if not -(len(x)-n) else x[-(len(x)-n):],n)][0]
split([1,2,3,4,5,6,7,8,9],2)
[[1, 2], [3, 4], [5, 6], [7, 8], [9]]
Implementation using numpy.linspace method.
Just specify the number of parts you want the array to be divided in to.The divisions will be of nearly equal size.
Example :
import numpy as np
a=np.arange(10)
print "Input array:",a
parts=3
i=np.linspace(np.min(a),np.max(a)+1,parts+1)
i=np.array(i,dtype='uint16') # Indices should be floats
split_arr=[]
for ind in range(i.size-1):
split_arr.append(a[i[ind]:i[ind+1]]
print "Array split in to %d parts : "%(parts),split_arr
Gives :
Input array: [0 1 2 3 4 5 6 7 8 9]
Array split in to 3 parts : [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8, 9])]
Using list comprehension:
def divide_list_to_chunks(list_, n):
return [list_[start::n] for start in range(n)]
If you divide n
elements into roughly k
chunks you can make n % k
chunks 1 element bigger than the other chunks to distribute the extra elements.
The following code will give you the length for the chunks:
[(n // k) + (1 if i < (n % k) else 0) for i in range(k)]
Example: n=11, k=3
results in [4, 4, 3]
You can then easily calculate the start indizes for the chunks:
[i * (n // k) + min(i, n % k) for i in range(k)]
Example: n=11, k=3
results in [0, 4, 8]
Using the i+1
th chunk as the boundary we get that the i
th chunk of list l
with len n
is
l[i * (n // k) + min(i, n % k):(i+1) * (n // k) + min(i+1, n % k)]
As a final step create a list from all the chunks using list comprehension:
[l[i * (n // k) + min(i, n % k):(i+1) * (n // k) + min(i+1, n % k)] for i in range(k)]
Example: n=11, k=3, l=range(n)
results in [range(0, 4), range(4, 8), range(8, 11)]
Rounding the linspace and using it as an index is an easier solution than what amit12690 proposes.
function chunks=chunkit(array,num)
index = round(linspace(0,size(array,2),num+1));
chunks = cell(1,num);
for x = 1:num
chunks{x} = array(:,index(x)+1:index(x+1));
end
end
This is the raison d’être for numpy.array_split
*:
>>> import numpy as np
>>> print(*np.array_split(range(10), 3))
[0 1 2 3] [4 5 6] [7 8 9]
>>> print(*np.array_split(range(10), 4))
[0 1 2] [3 4 5] [6 7] [8 9]
>>> print(*np.array_split(range(10), 5))
[0 1] [2 3] [4 5] [6 7] [8 9]
*credit to Zero Piraeus in room 6
Here’s a generator that can handle any positive (integer) number of chunks. If the number of chunks is greater than the input list length some chunks will be empty. This algorithm alternates between short and long chunks rather than segregating them.
I’ve also included some code for testing the ragged_chunks
function.
''' Split a list into "ragged" chunks
The size of each chunk is either the floor or ceiling of len(seq) / chunks
chunks can be > len(seq), in which case there will be empty chunks
Written by PM 2Ring 2017.03.30
'''
def ragged_chunks(seq, chunks):
size = len(seq)
start = 0
for i in range(1, chunks + 1):
stop = i * size // chunks
yield seq[start:stop]
start = stop
# test
def test_ragged_chunks(maxsize):
for size in range(0, maxsize):
seq = list(range(size))
for chunks in range(1, size + 1):
minwidth = size // chunks
#ceiling division
maxwidth = -(-size // chunks)
a = list(ragged_chunks(seq, chunks))
sizes = [len(u) for u in a]
deltas = all(minwidth <= u <= maxwidth for u in sizes)
assert all((sum(a, []) == seq, sum(sizes) == size, deltas))
return True
if test_ragged_chunks(100):
print('ok')
We can make this slightly more efficient by exporting the multiplication into the range
call, but I think the previous version is more readable (and DRYer).
def ragged_chunks(seq, chunks):
size = len(seq)
start = 0
for i in range(size, size * chunks + 1, size):
stop = i // chunks
yield seq[start:stop]
start = stop
This will do the split into equal parts by one single expression while keeping the order:
myList = list(range(18)) # given list
N = 5 # desired number of parts
[myList[(i*len(myList))//N:((i+1)*len(myList))//N] for i in range(N)]
# [[0, 1, 2], [3, 4, 5, 6], [7, 8, 9], [10, 11, 12, 13], [14, 15, 16, 17]]
The parts will differ in not more than one element. The split of 18 into 5 parts results in 3 + 4 + 3 + 4 + 4 = 18.
My solution, easy to understand
def split_list(lst, n):
splitted = []
for i in reversed(range(1, n + 1)):
split_point = len(lst)//i
splitted.append(lst[:split_point])
lst = lst[split_point:]
return splitted
And shortest one-liner on this page(written by my girl)
def split(l, n):
return [l[int(i*len(l)/n):int((i+1)*len(l)/n-1)] for i in range(n)]
n = 2
[list(x) for x in mit.divide(n, range(5, 11))]
# [[5, 6, 7], [8, 9, 10]]
[list(x) for x in mit.divide(n, range(5, 12))]
# [[5, 6, 7, 8], [9, 10, 11]]
Install via > pip install more_itertools
.
#!/usr/bin/python
first_names = ['Steve', 'Jane', 'Sara', 'Mary','Jack','Bob', 'Bily', 'Boni', 'Chris','Sori', 'Will', 'Won','Li']
def chunks(l, n):
for i in range(0, len(l), n):
# Create an index range for l of n items:
yield l[i:i+n]
result = list(chunks(first_names, 5))
print result
Picked from this link, and this was what helped me. I had a pre-defined list.
say you want to split into 5 parts:
p1, p2, p3, p4, p5 = np.split(df, 5)
I’ve written code in this case myself:
def chunk_ports(port_start, port_end, portions):
if port_end < port_start:
return None
total = port_end - port_start + 1
fractions = int(math.floor(float(total) / portions))
results = []
# No enough to chuck.
if fractions < 1:
return None
# Reverse, so any additional items would be in the first range.
_e = port_end
for i in range(portions, 0, -1):
print "i", i
if i == 1:
_s = port_start
else:
_s = _e - fractions + 1
results.append((_s, _e))
_e = _s - 1
results.reverse()
return results
divide_ports(1, 10, 9) would return
[(1, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8), (9, 9), (10, 10)]
this code works for me (Python3-compatible):
def chunkify(tab, num):
return [tab[i*num: i*num+num] for i in range(len(tab)//num+(1 if len(tab)%num else 0))]
example (for bytearray type, but it works for lists as well):
b = bytearray(b'x01x02x03x04x05x06x07x08')
>>> chunkify(b,3)
[bytearray(b'x01x02x03'), bytearray(b'x04x05x06'), bytearray(b'x07x08')]
>>> chunkify(b,4)
[bytearray(b'x01x02x03x04'), bytearray(b'x05x06x07x08')]
This one provides chunks of length <= n, >= 0
def
chunkify(lst, n):
num_chunks = int(math.ceil(len(lst) / float(n))) if n < len(lst) else 1
return [lst[n*i:n*(i+1)] for i in range(num_chunks)]
for example
>>> chunkify(range(11), 3)
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
>>> chunkify(range(11), 8)
[[0, 1, 2, 3, 4, 5, 6, 7], [8, 9, 10]]
I tried most part of solutions, but they didn’t work for my case, so I make a new function that work for most of cases and for any type of array:
import math
def chunkIt(seq, num):
seqLen = len(seq)
total_chunks = math.ceil(seqLen / num)
items_per_chunk = num
out = []
last = 0
while last < seqLen:
out.append(seq[last:(last + items_per_chunk)])
last += items_per_chunk
return out
def evenly(l, n):
len_ = len(l)
split_size = len_ // n
split_size = n if not split_size else split_size
offsets = [i for i in range(0, len_, split_size)]
return [l[offset:offset + split_size] for offset in offsets]
Example:
l = [a for a in range(97)]
should be consist of 10 parts, each have 9 elements except the last one.
Output:
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
[9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59, 60, 61, 62],
[63, 64, 65, 66, 67, 68, 69, 70, 71],
[72, 73, 74, 75, 76, 77, 78, 79, 80],
[81, 82, 83, 84, 85, 86, 87, 88, 89],
[90, 91, 92, 93, 94, 95, 96]]
If you don’t mind that the order will be changed, I recommend you to use @job solution, otherwise, you can use this:
def chunkIt(seq, num):
steps = int(len(seq) / float(num))
out = []
last = 0.0
while last < len(seq):
if len(seq) - (last + steps) < steps:
until = len(seq)
steps = len(seq) - last
else:
until = int(last + steps)
out.append(seq[int(last): until])
last += steps
return out
Let’s say you want to split a list [1, 2, 3, 4, 5, 6, 7, 8] into 3 element lists
like [[1,2,3], [4, 5, 6], [7, 8]], where if the last remaining elements left are less than 3, they are grouped together.
my_list = [1, 2, 3, 4, 5, 6, 7, 8]
my_list2 = [my_list[i:i+3] for i in range(0, len(my_list), 3)]
print(my_list2)
Output: [[1,2,3], [4, 5, 6], [7, 8]]
Where length of one part is 3. Replace 3 with your own chunk size.
1>
import numpy as np
data # your array
total_length = len(data)
separate = 10
sub_array_size = total_length // separate
safe_separate = sub_array_size * separate
splited_lists = np.split(np.array(data[:safe_separate]), separate)
splited_lists[separate - 1] = np.concatenate(splited_lists[separate - 1],
np.array(data[safe_separate:total_length]))
splited_lists # your output
2>
splited_lists = np.array_split(np.array(data), separate)
def chunk_array(array : List, n: int) -> List[List]:
chunk_size = len(array) // n
chunks = []
i = 0
while i < len(array):
# if less than chunk_size left add the remainder to last element
if len(array) - (i + chunk_size + 1) < 0:
chunks[-1].append(*array[i:i + chunk_size])
break
else:
chunks.append(array[i:i + chunk_size])
i += chunk_size
return chunks
here’s my version (inspired from Max’s)
Another attempt at simple readable chunker that works.
def chunk(iterable, count): # returns a *generator* that divides `iterable` into `count` of contiguous chunks of similar size
assert count >= 1
return (iterable[int(_*len(iterable)/count+0.5):int((_+1)*len(iterable)/count+0.5)] for _ in range(count))
print("Chunk count: ", len(list( chunk(range(105),10))))
print("Chunks: ", list( chunk(range(105),10)))
print("Chunks: ", list(map(list,chunk(range(105),10))))
print("Chunk lengths:", list(map(len, chunk(range(105),10))))
print("Testing...")
for iterable_length in range(100):
for chunk_count in range(1,100):
chunks = list(chunk(range(iterable_length),chunk_count))
assert chunk_count == len(chunks)
assert iterable_length == sum(map(len,chunks))
assert all(map(lambda _:abs(len(_)-iterable_length/chunk_count)<=1,chunks))
print("Okay")
Outputs:
Chunk count: 10
Chunks: [range(0, 11), range(11, 21), range(21, 32), range(32, 42), range(42, 53), range(53, 63), range(63, 74), range(74, 84), range(84, 95), range(95, 105)]
Chunks: [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39, 40, 41], [42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52], [53, 54, 55, 56, 57, 58, 59, 60, 61, 62], [63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73], [74, 75, 76, 77, 78, 79, 80, 81, 82, 83], [84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94], [95, 96, 97, 98, 99, 100, 101, 102, 103, 104]]
Chunk lengths: [11, 10, 11, 10, 11, 10, 11, 10, 11, 10]
Testing...
Okay
n = len(lst)
# p is the number of parts to be divided
x = int(n/p)
i = 0
j = x
lstt = []
while (i< len(lst) or j <len(lst)):
lstt.append(lst[i:j])
i+=x
j+=x
print(lstt)
This is the simplest answer if it is known that the list divides into equal parts.
The other solutions seem to be a bit long. Here is a one-liner using list comprehension and the NumPy function array_split
. array_split(list, n)
will simply split the list
into n
parts.
[x.tolist() for x in np.array_split(range(10), 3)]
def chunkify(target_list, chunk_size):
return [target_list[i:i+chunk_size] for i in range(0, len(target_list), chunk_size)]
>>> l = [5432, 432, 67, "fdas", True, True, False, (4324,131), 876, "ofsa", 8, 909, b'765']
>>> print(chunkify(l, 3))
>>> [[5432, 432, 67], ['fdas', True, True], [False, (4324, 131), 876], ['ofsa', 8, 909], [b'765']]
Here’s a single function that handles most of the various split cases:
def splitList(lst, into):
'''Split a list into parts.
:Parameters:
into (str) = Split the list into parts defined by the following:
'<n>parts' - Split the list into n parts.
ex. 2 returns: [[1, 2, 3, 5], [7, 8, 9]] from [1,2,3,5,7,8,9]
'<n>parts+' - Split the list into n equal parts with any trailing remainder.
ex. 2 returns: [[1, 2, 3], [5, 7, 8], [9]] from [1,2,3,5,7,8,9]
'<n>chunks' - Split into sublists of n size.
ex. 2 returns: [[1,2], [3,5], [7,8], [9]] from [1,2,3,5,7,8,9]
'contiguous' - The list will be split by contiguous numerical values.
ex. 'contiguous' returns: [[1,2,3], [5], [7,8,9]] from [1,2,3,5,7,8,9]
'range' - The values of 'contiguous' will be limited to the high and low end of each range.
ex. 'range' returns: [[1,3], [5], [7,9]] from [1,2,3,5,7,8,9]
:Return:
(list)
'''
from string import digits, ascii_letters, punctuation
mode = into.lower().lstrip(digits)
digit = into.strip(ascii_letters+punctuation)
n = int(digit) if digit else None
if n:
if mode=='parts':
n = len(lst)*-1 // n*-1 #ceil
elif mode=='parts+':
n = len(lst) // n
return [lst[i:i+n] for i in range(0, len(lst), n)]
elif mode=='contiguous' or mode=='range':
from itertools import groupby
from operator import itemgetter
try:
contiguous = [list(map(itemgetter(1), g)) for k, g in groupby(enumerate(lst), lambda x: int(x[0])-int(x[1]))]
except ValueError as error:
print ('{} in splitListn # Error: {} #n {}'.format(__file__, error, lst))
return lst
if mode=='range':
return [[i[0], i[-1]] if len(i)>1 else (i) for i in contiguous]
return contiguous
r = splitList([1, '2', 3, 5, '7', 8, 9], into='2parts')
print (r) #returns: [[1, '2', 3, 5], ['7', 8, 9]]
What is the best way to divide a list into roughly equal parts? For example, if the list has 7 elements and is split it into 2 parts, we want to get 3 elements in one part, and the other should have 4 elements.
I’m looking for something like even_split(L, n)
that breaks L
into n
parts.
def chunks(L, n):
""" Yield successive n-sized chunks from L.
"""
for i in range(0, len(L), n):
yield L[i:i+n]
The code above gives chunks of 3, rather than 3 chunks. I could simply transpose (iterate over this and take the first element of each column, call that part one, then take the second and put it in part two, etc), but that destroys the ordering of the items.
This code is broken due to rounding errors. Do not use it!!!
assert len(chunkIt([1,2,3], 10)) == 10 # fails
Here’s one that could work:
def chunkIt(seq, num):
avg = len(seq) / float(num)
out = []
last = 0.0
while last < len(seq):
out.append(seq[int(last):int(last + avg)])
last += avg
return out
Testing:
>>> chunkIt(range(10), 3)
[[0, 1, 2], [3, 4, 5], [6, 7, 8, 9]]
>>> chunkIt(range(11), 3)
[[0, 1, 2], [3, 4, 5, 6], [7, 8, 9, 10]]
>>> chunkIt(range(12), 3)
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
Changing the code to yield n
chunks rather than chunks of n
:
def chunks(l, n):
""" Yield n successive chunks from l.
"""
newn = int(len(l) / n)
for i in xrange(0, n-1):
yield l[i*newn:i*newn+newn]
yield l[n*newn-newn:]
l = range(56)
three_chunks = chunks (l, 3)
print three_chunks.next()
print three_chunks.next()
print three_chunks.next()
which gives:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
[18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]
[36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55]
This will assign the extra elements to the final group which is not perfect but well within your specification of “roughly N equal parts” 🙂 By that, I mean 56 elements would be better as (19,19,18) whereas this gives (18,18,20).
You can get the more balanced output with the following code:
#!/usr/bin/python
def chunks(l, n):
""" Yield n successive chunks from l.
"""
newn = int(1.0 * len(l) / n + 0.5)
for i in xrange(0, n-1):
yield l[i*newn:i*newn+newn]
yield l[n*newn-newn:]
l = range(56)
three_chunks = chunks (l, 3)
print three_chunks.next()
print three_chunks.next()
print three_chunks.next()
which outputs:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
[19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37]
[38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55]
Here is one that adds None
to make the lists equal length
>>> from itertools import izip_longest
>>> def chunks(l, n):
""" Yield n successive chunks from l. Pads extra spaces with None
"""
return list(zip(*izip_longest(*[iter(l)]*n)))
>>> l=range(54)
>>> chunks(l,3)
[(0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51), (1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43, 46, 49, 52), (2, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44, 47, 50, 53)]
>>> chunks(l,4)
[(0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52), (1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53), (2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, None), (3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, None)]
>>> chunks(l,5)
[(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50), (1, 6, 11, 16, 21, 26, 31, 36, 41, 46, 51), (2, 7, 12, 17, 22, 27, 32, 37, 42, 47, 52), (3, 8, 13, 18, 23, 28, 33, 38, 43, 48, 53), (4, 9, 14, 19, 24, 29, 34, 39, 44, 49, None)]
You can write it fairly simply as a list generator:
def split(a, n):
k, m = divmod(len(a), n)
return (a[i*k+min(i, m):(i+1)*k+min(i+1, m)] for i in range(n))
Example:
>>> list(split(range(11), 3))
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10]]
As long as you don’t want anything silly like continuous chunks:
>>> def chunkify(lst,n):
... return [lst[i::n] for i in xrange(n)]
...
>>> chunkify(range(13), 3)
[[0, 3, 6, 9, 12], [1, 4, 7, 10], [2, 5, 8, 11]]
Have a look at numpy.split:
>>> a = numpy.array([1,2,3,4])
>>> numpy.split(a, 2)
[array([1, 2]), array([3, 4])]
Another way would be something like this, the idea here is to use grouper, but get rid of None
. In this case we’ll have all ‘small_parts’ formed from elements at the first part of the list, and ‘larger_parts’ from the later part of the list. Length of ‘larger parts’ is len(small_parts) + 1. We need to consider x as two different sub-parts.
from itertools import izip_longest
import numpy as np
def grouper(n, iterable, fillvalue=None): # This is grouper from itertools
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
def another_chunk(x,num):
extra_ele = len(x)%num #gives number of parts that will have an extra element
small_part = int(np.floor(len(x)/num)) #gives number of elements in a small part
new_x = list(grouper(small_part,x[:small_part*(num-extra_ele)]))
new_x.extend(list(grouper(small_part+1,x[small_part*(num-extra_ele):])))
return new_x
The way I have it set up returns a list of tuples:
>>> x = range(14)
>>> another_chunk(x,3)
[(0, 1, 2, 3), (4, 5, 6, 7, 8), (9, 10, 11, 12, 13)]
>>> another_chunk(x,4)
[(0, 1, 2), (3, 4, 5), (6, 7, 8, 9), (10, 11, 12, 13)]
>>> another_chunk(x,5)
[(0, 1), (2, 3, 4), (5, 6, 7), (8, 9, 10), (11, 12, 13)]
>>>
Here’s another variant that spreads the “remaining” elements evenly among all the chunks, one at a time until there are none left. In this implementation, the larger chunks occur at the beginning the process.
def chunks(l, k):
""" Yield k successive chunks from l."""
if k < 1:
yield []
raise StopIteration
n = len(l)
avg = n/k
remainders = n % k
start, end = 0, avg
while start < n:
if remainders > 0:
end = end + 1
remainders = remainders - 1
yield l[start:end]
start, end = end, end+avg
For example, generate 4 chunks from a list of 14 elements:
>>> list(chunks(range(14), 4))
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10], [11, 12, 13]]
>>> map(len, list(chunks(range(14), 4)))
[4, 4, 3, 3]
The same as job’s answer, but takes into account lists with size smaller than the number of chuncks.
def chunkify(lst,n):
[ lst[i::n] for i in xrange(n if n < len(lst) else len(lst)) ]
if n (number of chunks) is 7 and lst (the list to divide) is [1, 2, 3] the chunks are [[0], [1], [2]] instead of [[0], [1], [2], [], [], [], []]
Here is my solution:
def chunks(l, amount):
if amount < 1:
raise ValueError('amount must be positive integer')
chunk_len = len(l) // amount
leap_parts = len(l) % amount
remainder = amount // 2 # make it symmetrical
i = 0
while i < len(l):
remainder += leap_parts
end_index = i + chunk_len
if remainder >= amount:
remainder -= amount
end_index += 1
yield l[i:end_index]
i = end_index
Produces
>>> list(chunks([1, 2, 3, 4, 5, 6, 7], 3))
[[1, 2], [3, 4, 5], [6, 7]]
You could also use:
split=lambda x,n: x if not x else [x[:n]]+[split([] if not -(len(x)-n) else x[-(len(x)-n):],n)][0]
split([1,2,3,4,5,6,7,8,9],2)
[[1, 2], [3, 4], [5, 6], [7, 8], [9]]
Implementation using numpy.linspace method.
Just specify the number of parts you want the array to be divided in to.The divisions will be of nearly equal size.
Example :
import numpy as np
a=np.arange(10)
print "Input array:",a
parts=3
i=np.linspace(np.min(a),np.max(a)+1,parts+1)
i=np.array(i,dtype='uint16') # Indices should be floats
split_arr=[]
for ind in range(i.size-1):
split_arr.append(a[i[ind]:i[ind+1]]
print "Array split in to %d parts : "%(parts),split_arr
Gives :
Input array: [0 1 2 3 4 5 6 7 8 9]
Array split in to 3 parts : [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8, 9])]
Using list comprehension:
def divide_list_to_chunks(list_, n):
return [list_[start::n] for start in range(n)]
If you divide n
elements into roughly k
chunks you can make n % k
chunks 1 element bigger than the other chunks to distribute the extra elements.
The following code will give you the length for the chunks:
[(n // k) + (1 if i < (n % k) else 0) for i in range(k)]
Example: n=11, k=3
results in [4, 4, 3]
You can then easily calculate the start indizes for the chunks:
[i * (n // k) + min(i, n % k) for i in range(k)]
Example: n=11, k=3
results in [0, 4, 8]
Using the i+1
th chunk as the boundary we get that the i
th chunk of list l
with len n
is
l[i * (n // k) + min(i, n % k):(i+1) * (n // k) + min(i+1, n % k)]
As a final step create a list from all the chunks using list comprehension:
[l[i * (n // k) + min(i, n % k):(i+1) * (n // k) + min(i+1, n % k)] for i in range(k)]
Example: n=11, k=3, l=range(n)
results in [range(0, 4), range(4, 8), range(8, 11)]
Rounding the linspace and using it as an index is an easier solution than what amit12690 proposes.
function chunks=chunkit(array,num)
index = round(linspace(0,size(array,2),num+1));
chunks = cell(1,num);
for x = 1:num
chunks{x} = array(:,index(x)+1:index(x+1));
end
end
This is the raison d’être for numpy.array_split
*:
>>> import numpy as np
>>> print(*np.array_split(range(10), 3))
[0 1 2 3] [4 5 6] [7 8 9]
>>> print(*np.array_split(range(10), 4))
[0 1 2] [3 4 5] [6 7] [8 9]
>>> print(*np.array_split(range(10), 5))
[0 1] [2 3] [4 5] [6 7] [8 9]
*credit to Zero Piraeus in room 6
Here’s a generator that can handle any positive (integer) number of chunks. If the number of chunks is greater than the input list length some chunks will be empty. This algorithm alternates between short and long chunks rather than segregating them.
I’ve also included some code for testing the ragged_chunks
function.
''' Split a list into "ragged" chunks
The size of each chunk is either the floor or ceiling of len(seq) / chunks
chunks can be > len(seq), in which case there will be empty chunks
Written by PM 2Ring 2017.03.30
'''
def ragged_chunks(seq, chunks):
size = len(seq)
start = 0
for i in range(1, chunks + 1):
stop = i * size // chunks
yield seq[start:stop]
start = stop
# test
def test_ragged_chunks(maxsize):
for size in range(0, maxsize):
seq = list(range(size))
for chunks in range(1, size + 1):
minwidth = size // chunks
#ceiling division
maxwidth = -(-size // chunks)
a = list(ragged_chunks(seq, chunks))
sizes = [len(u) for u in a]
deltas = all(minwidth <= u <= maxwidth for u in sizes)
assert all((sum(a, []) == seq, sum(sizes) == size, deltas))
return True
if test_ragged_chunks(100):
print('ok')
We can make this slightly more efficient by exporting the multiplication into the range
call, but I think the previous version is more readable (and DRYer).
def ragged_chunks(seq, chunks):
size = len(seq)
start = 0
for i in range(size, size * chunks + 1, size):
stop = i // chunks
yield seq[start:stop]
start = stop
This will do the split into equal parts by one single expression while keeping the order:
myList = list(range(18)) # given list
N = 5 # desired number of parts
[myList[(i*len(myList))//N:((i+1)*len(myList))//N] for i in range(N)]
# [[0, 1, 2], [3, 4, 5, 6], [7, 8, 9], [10, 11, 12, 13], [14, 15, 16, 17]]
The parts will differ in not more than one element. The split of 18 into 5 parts results in 3 + 4 + 3 + 4 + 4 = 18.
My solution, easy to understand
def split_list(lst, n):
splitted = []
for i in reversed(range(1, n + 1)):
split_point = len(lst)//i
splitted.append(lst[:split_point])
lst = lst[split_point:]
return splitted
And shortest one-liner on this page(written by my girl)
def split(l, n):
return [l[int(i*len(l)/n):int((i+1)*len(l)/n-1)] for i in range(n)]
n = 2
[list(x) for x in mit.divide(n, range(5, 11))]
# [[5, 6, 7], [8, 9, 10]]
[list(x) for x in mit.divide(n, range(5, 12))]
# [[5, 6, 7, 8], [9, 10, 11]]
Install via > pip install more_itertools
.
#!/usr/bin/python
first_names = ['Steve', 'Jane', 'Sara', 'Mary','Jack','Bob', 'Bily', 'Boni', 'Chris','Sori', 'Will', 'Won','Li']
def chunks(l, n):
for i in range(0, len(l), n):
# Create an index range for l of n items:
yield l[i:i+n]
result = list(chunks(first_names, 5))
print result
Picked from this link, and this was what helped me. I had a pre-defined list.
say you want to split into 5 parts:
p1, p2, p3, p4, p5 = np.split(df, 5)
I’ve written code in this case myself:
def chunk_ports(port_start, port_end, portions):
if port_end < port_start:
return None
total = port_end - port_start + 1
fractions = int(math.floor(float(total) / portions))
results = []
# No enough to chuck.
if fractions < 1:
return None
# Reverse, so any additional items would be in the first range.
_e = port_end
for i in range(portions, 0, -1):
print "i", i
if i == 1:
_s = port_start
else:
_s = _e - fractions + 1
results.append((_s, _e))
_e = _s - 1
results.reverse()
return results
divide_ports(1, 10, 9) would return
[(1, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8), (9, 9), (10, 10)]
this code works for me (Python3-compatible):
def chunkify(tab, num):
return [tab[i*num: i*num+num] for i in range(len(tab)//num+(1 if len(tab)%num else 0))]
example (for bytearray type, but it works for lists as well):
b = bytearray(b'x01x02x03x04x05x06x07x08')
>>> chunkify(b,3)
[bytearray(b'x01x02x03'), bytearray(b'x04x05x06'), bytearray(b'x07x08')]
>>> chunkify(b,4)
[bytearray(b'x01x02x03x04'), bytearray(b'x05x06x07x08')]
This one provides chunks of length <= n, >= 0
def
chunkify(lst, n):
num_chunks = int(math.ceil(len(lst) / float(n))) if n < len(lst) else 1
return [lst[n*i:n*(i+1)] for i in range(num_chunks)]
for example
>>> chunkify(range(11), 3)
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
>>> chunkify(range(11), 8)
[[0, 1, 2, 3, 4, 5, 6, 7], [8, 9, 10]]
I tried most part of solutions, but they didn’t work for my case, so I make a new function that work for most of cases and for any type of array:
import math
def chunkIt(seq, num):
seqLen = len(seq)
total_chunks = math.ceil(seqLen / num)
items_per_chunk = num
out = []
last = 0
while last < seqLen:
out.append(seq[last:(last + items_per_chunk)])
last += items_per_chunk
return out
def evenly(l, n):
len_ = len(l)
split_size = len_ // n
split_size = n if not split_size else split_size
offsets = [i for i in range(0, len_, split_size)]
return [l[offset:offset + split_size] for offset in offsets]
Example:
l = [a for a in range(97)]
should be consist of 10 parts, each have 9 elements except the last one.
Output:
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
[9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59, 60, 61, 62],
[63, 64, 65, 66, 67, 68, 69, 70, 71],
[72, 73, 74, 75, 76, 77, 78, 79, 80],
[81, 82, 83, 84, 85, 86, 87, 88, 89],
[90, 91, 92, 93, 94, 95, 96]]
If you don’t mind that the order will be changed, I recommend you to use @job solution, otherwise, you can use this:
def chunkIt(seq, num):
steps = int(len(seq) / float(num))
out = []
last = 0.0
while last < len(seq):
if len(seq) - (last + steps) < steps:
until = len(seq)
steps = len(seq) - last
else:
until = int(last + steps)
out.append(seq[int(last): until])
last += steps
return out
Let’s say you want to split a list [1, 2, 3, 4, 5, 6, 7, 8] into 3 element lists
like [[1,2,3], [4, 5, 6], [7, 8]], where if the last remaining elements left are less than 3, they are grouped together.
my_list = [1, 2, 3, 4, 5, 6, 7, 8]
my_list2 = [my_list[i:i+3] for i in range(0, len(my_list), 3)]
print(my_list2)
Output: [[1,2,3], [4, 5, 6], [7, 8]]
Where length of one part is 3. Replace 3 with your own chunk size.
1>
import numpy as np
data # your array
total_length = len(data)
separate = 10
sub_array_size = total_length // separate
safe_separate = sub_array_size * separate
splited_lists = np.split(np.array(data[:safe_separate]), separate)
splited_lists[separate - 1] = np.concatenate(splited_lists[separate - 1],
np.array(data[safe_separate:total_length]))
splited_lists # your output
2>
splited_lists = np.array_split(np.array(data), separate)
def chunk_array(array : List, n: int) -> List[List]:
chunk_size = len(array) // n
chunks = []
i = 0
while i < len(array):
# if less than chunk_size left add the remainder to last element
if len(array) - (i + chunk_size + 1) < 0:
chunks[-1].append(*array[i:i + chunk_size])
break
else:
chunks.append(array[i:i + chunk_size])
i += chunk_size
return chunks
here’s my version (inspired from Max’s)
Another attempt at simple readable chunker that works.
def chunk(iterable, count): # returns a *generator* that divides `iterable` into `count` of contiguous chunks of similar size
assert count >= 1
return (iterable[int(_*len(iterable)/count+0.5):int((_+1)*len(iterable)/count+0.5)] for _ in range(count))
print("Chunk count: ", len(list( chunk(range(105),10))))
print("Chunks: ", list( chunk(range(105),10)))
print("Chunks: ", list(map(list,chunk(range(105),10))))
print("Chunk lengths:", list(map(len, chunk(range(105),10))))
print("Testing...")
for iterable_length in range(100):
for chunk_count in range(1,100):
chunks = list(chunk(range(iterable_length),chunk_count))
assert chunk_count == len(chunks)
assert iterable_length == sum(map(len,chunks))
assert all(map(lambda _:abs(len(_)-iterable_length/chunk_count)<=1,chunks))
print("Okay")
Outputs:
Chunk count: 10
Chunks: [range(0, 11), range(11, 21), range(21, 32), range(32, 42), range(42, 53), range(53, 63), range(63, 74), range(74, 84), range(84, 95), range(95, 105)]
Chunks: [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39, 40, 41], [42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52], [53, 54, 55, 56, 57, 58, 59, 60, 61, 62], [63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73], [74, 75, 76, 77, 78, 79, 80, 81, 82, 83], [84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94], [95, 96, 97, 98, 99, 100, 101, 102, 103, 104]]
Chunk lengths: [11, 10, 11, 10, 11, 10, 11, 10, 11, 10]
Testing...
Okay
n = len(lst)
# p is the number of parts to be divided
x = int(n/p)
i = 0
j = x
lstt = []
while (i< len(lst) or j <len(lst)):
lstt.append(lst[i:j])
i+=x
j+=x
print(lstt)
This is the simplest answer if it is known that the list divides into equal parts.
The other solutions seem to be a bit long. Here is a one-liner using list comprehension and the NumPy function array_split
. array_split(list, n)
will simply split the list
into n
parts.
[x.tolist() for x in np.array_split(range(10), 3)]
def chunkify(target_list, chunk_size):
return [target_list[i:i+chunk_size] for i in range(0, len(target_list), chunk_size)]
>>> l = [5432, 432, 67, "fdas", True, True, False, (4324,131), 876, "ofsa", 8, 909, b'765']
>>> print(chunkify(l, 3))
>>> [[5432, 432, 67], ['fdas', True, True], [False, (4324, 131), 876], ['ofsa', 8, 909], [b'765']]
Here’s a single function that handles most of the various split cases:
def splitList(lst, into):
'''Split a list into parts.
:Parameters:
into (str) = Split the list into parts defined by the following:
'<n>parts' - Split the list into n parts.
ex. 2 returns: [[1, 2, 3, 5], [7, 8, 9]] from [1,2,3,5,7,8,9]
'<n>parts+' - Split the list into n equal parts with any trailing remainder.
ex. 2 returns: [[1, 2, 3], [5, 7, 8], [9]] from [1,2,3,5,7,8,9]
'<n>chunks' - Split into sublists of n size.
ex. 2 returns: [[1,2], [3,5], [7,8], [9]] from [1,2,3,5,7,8,9]
'contiguous' - The list will be split by contiguous numerical values.
ex. 'contiguous' returns: [[1,2,3], [5], [7,8,9]] from [1,2,3,5,7,8,9]
'range' - The values of 'contiguous' will be limited to the high and low end of each range.
ex. 'range' returns: [[1,3], [5], [7,9]] from [1,2,3,5,7,8,9]
:Return:
(list)
'''
from string import digits, ascii_letters, punctuation
mode = into.lower().lstrip(digits)
digit = into.strip(ascii_letters+punctuation)
n = int(digit) if digit else None
if n:
if mode=='parts':
n = len(lst)*-1 // n*-1 #ceil
elif mode=='parts+':
n = len(lst) // n
return [lst[i:i+n] for i in range(0, len(lst), n)]
elif mode=='contiguous' or mode=='range':
from itertools import groupby
from operator import itemgetter
try:
contiguous = [list(map(itemgetter(1), g)) for k, g in groupby(enumerate(lst), lambda x: int(x[0])-int(x[1]))]
except ValueError as error:
print ('{} in splitListn # Error: {} #n {}'.format(__file__, error, lst))
return lst
if mode=='range':
return [[i[0], i[-1]] if len(i)>1 else (i) for i in contiguous]
return contiguous
r = splitList([1, '2', 3, 5, '7', 8, 9], into='2parts')
print (r) #returns: [[1, '2', 3, 5], ['7', 8, 9]]