How to get 'n' elements before Nth index of a list in Python?
Question:
I want to iterate over a large list wherein I need to do some computations using n
elements before the Nth
index of the large list. I’ve solved it using the following code snippet.
mylist = [1,2,3,4,5,6,7,8,9,10,11,12,13,14]
for i in range(len(mylist)):
j=i+3
data_till_i = mylist[:j]
current_window = data_till_i[-3:]
print(current_window)
I get the following from the above code snippet:
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
[4, 5, 6]
[5, 6, 7]
[6, 7, 8]
[7, 8, 9]
[8, 9, 10]
[9, 10, 11]
[10, 11, 12]
[11, 12, 13]
[12, 13, 14]
[12, 13, 14]
[12, 13, 14]
Is there any one liner or more efficient way to do the exact same thing that will take less computation time? As my list size is very large (list has length > 100K
), I’m worried about time complexity.
Thank you.
UPDATE:
My actual list is in following format:
[('string_attribute',1659675302861,3544.0), ('string_attribute', 1659675304443, 3544.0).........]
Here, the string_attribute
is some attribute that is same for all the time and can be excluded from the computation.
Answers:
List comprehension for example? (use numpy arrays for fater iteration)
import numpy as np
mylist = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14])
chunk_size = 3
splited_list = np.array([mylist[x:x+chunk_size] for x in range(0,len(mylist)-chunk_size)])
You can cast the result to numpy array or cast every item on the list to a simple python list.
What you’re after is called a rolling window operation. If you want to work on list
type specifically, there is a shorter formulation using islice
as proposed here:
window_size = 3
for i in range(len(mylist) - window_size + 1):
print(mylist[i: i + window_size])
If your data is numerical, as in the example, I’d rather propose to use numpy
as this will give you much better performance! Using the proposal from here, your example becomes:
from numpy.lib.stride_tricks import sliding_window_view
sliding_window_view(np.array(mylist), window_shape = 3)
To give you a feeling for the timing, we can turn the options above into functions, create a much longer list, and compare the timing using timeit
e.g. in Jupyter:
def rolling_window_using_iterator(list_, window_size):
result = []
for i in range(len(list_) - window_size + 1):
result.append(list_[i: i + window_size])
return result
def rolling_window_using_numpy(list_, window_size):
return sliding_window_view(np.array(list_), window_shape = 3)
long_list = list(range(10000000))
%timeit rolling_window_using_iterator(long_list, 3)
%timeit rolling_window_using_numpy(long_list, 3)
prints (on my machine):
1.8 s ± 22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
422 ms ± 967 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
You can try sliding_window_view
import numpy as np
n = 3
mylist = [1,2,3,4,5,6,7,8,9,10,11,12,13,14]
window = np.lib.stride_tricks.sliding_window_view(mylist, n)
out = np.append(window, [window[-1] for _ in range(n-1)], axis=0)
print(out)
[[ 1 2 3]
[ 2 3 4]
[ 3 4 5]
[ 4 5 6]
[ 5 6 7]
[ 6 7 8]
[ 7 8 9]
[ 8 9 10]
[ 9 10 11]
[10 11 12]
[11 12 13]
[12 13 14]
[12 13 14]
[12 13 14]]
For one liner, if your Python version is greater than 3.8.0, you can try the walrus operator
out = np.append((window := np.lib.stride_tricks.sliding_window_view(mylist, n)),
[window[-1] for _ in range(n-1)], axis=0)
I tried this way, it iterate the list in less than a second
myList = [1,2,3,4,5,6,7,8,9,10,11,12,13,14]
for index, val in enumerate(myList):
if index >= 3 :print("{} : {}".format(index, myList[index-3:index]))
The "list[index-3:index]" allow to slice the list from the nth-3 element to the nth element.
Hope it helps
I want to iterate over a large list wherein I need to do some computations using n
elements before the Nth
index of the large list. I’ve solved it using the following code snippet.
mylist = [1,2,3,4,5,6,7,8,9,10,11,12,13,14]
for i in range(len(mylist)):
j=i+3
data_till_i = mylist[:j]
current_window = data_till_i[-3:]
print(current_window)
I get the following from the above code snippet:
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
[4, 5, 6]
[5, 6, 7]
[6, 7, 8]
[7, 8, 9]
[8, 9, 10]
[9, 10, 11]
[10, 11, 12]
[11, 12, 13]
[12, 13, 14]
[12, 13, 14]
[12, 13, 14]
Is there any one liner or more efficient way to do the exact same thing that will take less computation time? As my list size is very large (list has length > 100K
), I’m worried about time complexity.
Thank you.
UPDATE:
My actual list is in following format:
[('string_attribute',1659675302861,3544.0), ('string_attribute', 1659675304443, 3544.0).........]
Here, the string_attribute
is some attribute that is same for all the time and can be excluded from the computation.
List comprehension for example? (use numpy arrays for fater iteration)
import numpy as np
mylist = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14])
chunk_size = 3
splited_list = np.array([mylist[x:x+chunk_size] for x in range(0,len(mylist)-chunk_size)])
You can cast the result to numpy array or cast every item on the list to a simple python list.
What you’re after is called a rolling window operation. If you want to work on list
type specifically, there is a shorter formulation using islice
as proposed here:
window_size = 3
for i in range(len(mylist) - window_size + 1):
print(mylist[i: i + window_size])
If your data is numerical, as in the example, I’d rather propose to use numpy
as this will give you much better performance! Using the proposal from here, your example becomes:
from numpy.lib.stride_tricks import sliding_window_view
sliding_window_view(np.array(mylist), window_shape = 3)
To give you a feeling for the timing, we can turn the options above into functions, create a much longer list, and compare the timing using timeit
e.g. in Jupyter:
def rolling_window_using_iterator(list_, window_size):
result = []
for i in range(len(list_) - window_size + 1):
result.append(list_[i: i + window_size])
return result
def rolling_window_using_numpy(list_, window_size):
return sliding_window_view(np.array(list_), window_shape = 3)
long_list = list(range(10000000))
%timeit rolling_window_using_iterator(long_list, 3)
%timeit rolling_window_using_numpy(long_list, 3)
prints (on my machine):
1.8 s ± 22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
422 ms ± 967 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
You can try sliding_window_view
import numpy as np
n = 3
mylist = [1,2,3,4,5,6,7,8,9,10,11,12,13,14]
window = np.lib.stride_tricks.sliding_window_view(mylist, n)
out = np.append(window, [window[-1] for _ in range(n-1)], axis=0)
print(out)
[[ 1 2 3]
[ 2 3 4]
[ 3 4 5]
[ 4 5 6]
[ 5 6 7]
[ 6 7 8]
[ 7 8 9]
[ 8 9 10]
[ 9 10 11]
[10 11 12]
[11 12 13]
[12 13 14]
[12 13 14]
[12 13 14]]
For one liner, if your Python version is greater than 3.8.0, you can try the walrus operator
out = np.append((window := np.lib.stride_tricks.sliding_window_view(mylist, n)),
[window[-1] for _ in range(n-1)], axis=0)
I tried this way, it iterate the list in less than a second
myList = [1,2,3,4,5,6,7,8,9,10,11,12,13,14]
for index, val in enumerate(myList):
if index >= 3 :print("{} : {}".format(index, myList[index-3:index]))
The "list[index-3:index]" allow to slice the list from the nth-3 element to the nth element.
Hope it helps