Python: Finding differences between elements of a list
Question:
Given a list of numbers, how does one find differences between every (i
)-th elements and its (i+1
)-th?
Is it better to use a lambda
expression or maybe a list comprehension?
For example:
Given a list t=[1,3,6,...]
, the goal is to find a list v=[2,3,...]
because 3-1=2
, 6-3=3
, etc.
Answers:
>>> t
[1, 3, 6]
>>> [j-i for i, j in zip(t[:-1], t[1:])] # or use itertools.izip in py2k
[2, 3]
Ok. I think I found the proper solution:
v = [x[0]-x[1] for x in zip(t[1:],t[:-1])]
The other answers are correct but if you’re doing numerical work, you might want to consider numpy. Using numpy, the answer is:
v = numpy.diff(t)
If you don’t want to use numpy
nor zip
, you can use the following solution:
>>> t = [1, 3, 6]
>>> v = [t[i+1]-t[i] for i in range(len(t)-1)]
>>> v
[2, 3]
You can use itertools.tee
and zip
to efficiently build the result:
from itertools import tee
# python2 only:
#from itertools import izip as zip
def differences(seq):
iterable, copied = tee(seq)
next(copied)
for x, y in zip(iterable, copied):
yield y - x
Or using itertools.islice
instead:
from itertools import islice
def differences(seq):
nexts = islice(seq, 1, None)
for x, y in zip(seq, nexts):
yield y - x
You can also avoid using the itertools
module:
def differences(seq):
iterable = iter(seq)
prev = next(iterable)
for element in iterable:
yield element - prev
prev = element
All these solution work in constant space if you don’t need to store all the results and support infinite iterables.
Here are some micro-benchmarks of the solutions:
In [12]: L = range(10**6)
In [13]: from collections import deque
In [15]: %timeit deque(differences_tee(L), maxlen=0)
10 loops, best of 3: 122 ms per loop
In [16]: %timeit deque(differences_islice(L), maxlen=0)
10 loops, best of 3: 127 ms per loop
In [17]: %timeit deque(differences_no_it(L), maxlen=0)
10 loops, best of 3: 89.9 ms per loop
And the other proposed solutions:
In [18]: %timeit [x[1] - x[0] for x in zip(L[1:], L)]
10 loops, best of 3: 163 ms per loop
In [19]: %timeit [L[i+1]-L[i] for i in range(len(L)-1)]
1 loops, best of 3: 395 ms per loop
In [20]: import numpy as np
In [21]: %timeit np.diff(L)
1 loops, best of 3: 479 ms per loop
In [35]: %%timeit
...: res = []
...: for i in range(len(L) - 1):
...: res.append(L[i+1] - L[i])
...:
1 loops, best of 3: 234 ms per loop
Note that:
zip(L[1:], L)
is equivalent to zip(L[1:], L[:-1])
since zip
already terminates on the shortest input, however it avoids a whole copy of L
.
- Accessing the single elements by index is very slow because every index access is a method call in python
-
numpy.diff
is slow because it has to first convert the list
to a ndarray
. Obviously if you start with an ndarray
it will be much faster:
In [22]: arr = np.array(L)
In [23]: %timeit np.diff(arr)
100 loops, best of 3: 3.02 ms per loop
My way
>>>v = [1,2,3,4,5]
>>>[v[i] - v[i-1] for i, value in enumerate(v[1:], 1)]
[1, 1, 1, 1]
A functional approach:
>>> import operator
>>> a = [1,3,5,7,11,13,17,21]
>>> map(operator.sub, a[1:], a[:-1])
[2, 2, 2, 4, 2, 4, 4]
Using generator:
>>> import operator, itertools
>>> g1,g2 = itertools.tee((x*x for x in xrange(5)),2)
>>> list(itertools.imap(operator.sub, itertools.islice(g1,1,None), g2))
[1, 3, 5, 7]
Using indices:
>>> [a[i+1]-a[i] for i in xrange(len(a)-1)]
[2, 2, 2, 4, 2, 4, 4]
I would suggest using
v = np.diff(t)
this is simple and easy to read.
But if you want v
to have the same length as t
then
v = np.diff([t[0]] + t) # for python 3.x
or
v = np.diff(t + [t[-1]])
FYI: this will only work for lists.
for numpy arrays
v = np.diff(np.append(t[0], t))
Using the :=
walrus operator available in Python 3.8+:
>>> t = [1, 3, 6]
>>> prev = t[0]; [-prev + (prev := x) for x in t[1:]]
[2, 3]
I suspect this is what the numpy diff command does anyway, but just for completeness you can simply difference the sub-vectors:
from numpy import array as a
a(x[1:])-a(x[:-1])
In addition, I wanted to add these solutions to generalizations of the question:
Solution with periodic boundaries
Sometimes with numerical integration you will want to difference a list with periodic boundary conditions (so the first element calculates the difference to the last. In this case the numpy.roll function is helpful:
v-np.roll(v,1)
Solutions with zero prepended
Another numpy solution (just for completeness) is to use
numpy.ediff1d(v)
This works as numpy.diff, but only on a vector (it flattens the input array). It offers the ability to prepend or append numbers to the resulting vector. This is useful when handling accumulated fields that is often the case fluxes in meteorological variables (e.g. rain, latent heat etc), as you want a resulting list of the same length as the input variable, with the first entry untouched.
Then you would write
np.ediff1d(v,to_begin=v[0])
Of course, you can also do this with the np.diff command, in this case though you need to prepend zero to the series with the prepend keyword:
np.diff(v,prepend=0.0)
All the above solutions return a vector that is the same length as the input.
Starting in Python 3.10
, with the new pairwise
function it’s possible to slide through pairs of elements and thus map on rolling pairs:
from itertools import pairwise
[y-x for (x, y) in pairwise([1, 3, 6, 7])]
# [2, 3, 1]
The intermediate result being:
pairwise([1, 3, 6, 7])
# [(1, 3), (3, 6), (6, 7)]
You can also convert the difference into an easily readable transition matrix using
v = t.reshape((c,r)).T - t.T
Where c
= number of items in the list and r
= 1 since a list is basically a vector or a 1d array.
Given a list of numbers, how does one find differences between every (i
)-th elements and its (i+1
)-th?
Is it better to use a lambda
expression or maybe a list comprehension?
For example:
Given a list t=[1,3,6,...]
, the goal is to find a list v=[2,3,...]
because 3-1=2
, 6-3=3
, etc.
>>> t
[1, 3, 6]
>>> [j-i for i, j in zip(t[:-1], t[1:])] # or use itertools.izip in py2k
[2, 3]
Ok. I think I found the proper solution:
v = [x[0]-x[1] for x in zip(t[1:],t[:-1])]
The other answers are correct but if you’re doing numerical work, you might want to consider numpy. Using numpy, the answer is:
v = numpy.diff(t)
If you don’t want to use numpy
nor zip
, you can use the following solution:
>>> t = [1, 3, 6]
>>> v = [t[i+1]-t[i] for i in range(len(t)-1)]
>>> v
[2, 3]
You can use itertools.tee
and zip
to efficiently build the result:
from itertools import tee
# python2 only:
#from itertools import izip as zip
def differences(seq):
iterable, copied = tee(seq)
next(copied)
for x, y in zip(iterable, copied):
yield y - x
Or using itertools.islice
instead:
from itertools import islice
def differences(seq):
nexts = islice(seq, 1, None)
for x, y in zip(seq, nexts):
yield y - x
You can also avoid using the itertools
module:
def differences(seq):
iterable = iter(seq)
prev = next(iterable)
for element in iterable:
yield element - prev
prev = element
All these solution work in constant space if you don’t need to store all the results and support infinite iterables.
Here are some micro-benchmarks of the solutions:
In [12]: L = range(10**6)
In [13]: from collections import deque
In [15]: %timeit deque(differences_tee(L), maxlen=0)
10 loops, best of 3: 122 ms per loop
In [16]: %timeit deque(differences_islice(L), maxlen=0)
10 loops, best of 3: 127 ms per loop
In [17]: %timeit deque(differences_no_it(L), maxlen=0)
10 loops, best of 3: 89.9 ms per loop
And the other proposed solutions:
In [18]: %timeit [x[1] - x[0] for x in zip(L[1:], L)]
10 loops, best of 3: 163 ms per loop
In [19]: %timeit [L[i+1]-L[i] for i in range(len(L)-1)]
1 loops, best of 3: 395 ms per loop
In [20]: import numpy as np
In [21]: %timeit np.diff(L)
1 loops, best of 3: 479 ms per loop
In [35]: %%timeit
...: res = []
...: for i in range(len(L) - 1):
...: res.append(L[i+1] - L[i])
...:
1 loops, best of 3: 234 ms per loop
Note that:
zip(L[1:], L)
is equivalent tozip(L[1:], L[:-1])
sincezip
already terminates on the shortest input, however it avoids a whole copy ofL
.- Accessing the single elements by index is very slow because every index access is a method call in python
-
numpy.diff
is slow because it has to first convert thelist
to andarray
. Obviously if you start with anndarray
it will be much faster:In [22]: arr = np.array(L) In [23]: %timeit np.diff(arr) 100 loops, best of 3: 3.02 ms per loop
My way
>>>v = [1,2,3,4,5]
>>>[v[i] - v[i-1] for i, value in enumerate(v[1:], 1)]
[1, 1, 1, 1]
A functional approach:
>>> import operator
>>> a = [1,3,5,7,11,13,17,21]
>>> map(operator.sub, a[1:], a[:-1])
[2, 2, 2, 4, 2, 4, 4]
Using generator:
>>> import operator, itertools
>>> g1,g2 = itertools.tee((x*x for x in xrange(5)),2)
>>> list(itertools.imap(operator.sub, itertools.islice(g1,1,None), g2))
[1, 3, 5, 7]
Using indices:
>>> [a[i+1]-a[i] for i in xrange(len(a)-1)]
[2, 2, 2, 4, 2, 4, 4]
I would suggest using
v = np.diff(t)
this is simple and easy to read.
But if you want v
to have the same length as t
then
v = np.diff([t[0]] + t) # for python 3.x
or
v = np.diff(t + [t[-1]])
FYI: this will only work for lists.
for numpy arrays
v = np.diff(np.append(t[0], t))
Using the :=
walrus operator available in Python 3.8+:
>>> t = [1, 3, 6]
>>> prev = t[0]; [-prev + (prev := x) for x in t[1:]]
[2, 3]
I suspect this is what the numpy diff command does anyway, but just for completeness you can simply difference the sub-vectors:
from numpy import array as a
a(x[1:])-a(x[:-1])
In addition, I wanted to add these solutions to generalizations of the question:
Solution with periodic boundaries
Sometimes with numerical integration you will want to difference a list with periodic boundary conditions (so the first element calculates the difference to the last. In this case the numpy.roll function is helpful:
v-np.roll(v,1)
Solutions with zero prepended
Another numpy solution (just for completeness) is to use
numpy.ediff1d(v)
This works as numpy.diff, but only on a vector (it flattens the input array). It offers the ability to prepend or append numbers to the resulting vector. This is useful when handling accumulated fields that is often the case fluxes in meteorological variables (e.g. rain, latent heat etc), as you want a resulting list of the same length as the input variable, with the first entry untouched.
Then you would write
np.ediff1d(v,to_begin=v[0])
Of course, you can also do this with the np.diff command, in this case though you need to prepend zero to the series with the prepend keyword:
np.diff(v,prepend=0.0)
All the above solutions return a vector that is the same length as the input.
Starting in Python 3.10
, with the new pairwise
function it’s possible to slide through pairs of elements and thus map on rolling pairs:
from itertools import pairwise
[y-x for (x, y) in pairwise([1, 3, 6, 7])]
# [2, 3, 1]
The intermediate result being:
pairwise([1, 3, 6, 7])
# [(1, 3), (3, 6), (6, 7)]
You can also convert the difference into an easily readable transition matrix using
v = t.reshape((c,r)).T - t.T
Where c
= number of items in the list and r
= 1 since a list is basically a vector or a 1d array.