Applying lambda function to datetime
Question:
I am using the following code to find clusters with difference <=1 in a list
from itertools import groupby
from operator import itemgetter
data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
for k, g in groupby(enumerate(data), lambda (i, x): (i-x)):
print map(itemgetter(1), g)
If however I change the data
to be an array of datetime to find cluster of datetimes which are only 1 hour apart, it fails.
I am trying the following:
>>> data
array([datetime.datetime(2016, 10, 1, 8, 0),
datetime.datetime(2016, 10, 1, 9, 0),
datetime.datetime(2016, 10, 1, 10, 0), ...,
datetime.datetime(2019, 1, 3, 9, 0),
datetime.datetime(2019, 1, 3, 10, 0),
datetime.datetime(2019, 1, 3, 11, 0)], dtype=object)
from itertools import groupby
from operator import itemgetter
data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
for k, g in groupby(enumerate(data), lambda (i, x): (i-x).total_seconds()/3600):
print map(itemgetter(1), g)
The error is:
for k, g in groupby(enumerate(data), lambda (i, x): int((i-x).total_seconds()/3600)):
TypeError: unsupported operand type(s) for -: 'int' and 'datetime.datetime'
There are lot of solutions on the web but I want to apply this particular one for learning.
Answers:
If you want to get all subsequences of items such that each item is an hour later than the previous one (not clusters of items that each are within an hour from eachother), you need to iterate over pairs (data[i-1], data[i])
. Currently, you are just iterating over (i, data[i])
which raises TypeError
when you try to substract data[i]
from i
. A working example could look like this:
from itertools import izip
def find_subsequences(data):
if len(data) <= 1:
return []
current_group = [data[0]]
delta = 3600
results = []
for current, next in izip(data, data[1:]):
if abs((next - current).total_seconds()) > delta:
# Here, `current` is the last item of the previous subsequence
# and `next` is the first item of the next subsequence.
if len(current_group) >= 2:
results.append(current_group)
current_group = [next]
continue
current_group.append(next)
return results
Let’s import datetime, and take out the elipsis from your data, and then apply a lambda function with two nested loops to calculate elapsed time between any two dates lower than one hour… a boolean matrix will identify the desired clusters easily.
from datetime import datetime as dt
data = np.array([dt(2016, 10, 1, 8, 0),
dt(2016, 10, 1, 9, 0),
dt(2016, 10, 1, 10, 0),
dt(2019, 1, 3, 9, 0),
dt(2019, 1, 3, 10, 0),
dt(2019, 1, 3, 11, 0)], dtype=object)
mds = lambda ds: [[abs(da-db).seconds/3600 <= 1 for da in ds] for db in ds]
Appling the function to data:
md = mds(data)
md will give us:
[[True, True, False, True, False, False],
[True, True, True, True, True, False],
[False, True, True, False, True, True],
[True, True, False, True, True, False],
[False, True, True, True, True, True],
[False, False, True, False, True, True]]
Note that the main diagonal is True
(Deltatime is zero), and the matrix is symmetrical. True elements are those where abs(date[i] - date[j])
is lower or equal to one hour, i and j between 0 and 5 indicates each pair of dates are considerated at the matrix.
I am using the following code to find clusters with difference <=1 in a list
from itertools import groupby
from operator import itemgetter
data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
for k, g in groupby(enumerate(data), lambda (i, x): (i-x)):
print map(itemgetter(1), g)
If however I change the data
to be an array of datetime to find cluster of datetimes which are only 1 hour apart, it fails.
I am trying the following:
>>> data
array([datetime.datetime(2016, 10, 1, 8, 0),
datetime.datetime(2016, 10, 1, 9, 0),
datetime.datetime(2016, 10, 1, 10, 0), ...,
datetime.datetime(2019, 1, 3, 9, 0),
datetime.datetime(2019, 1, 3, 10, 0),
datetime.datetime(2019, 1, 3, 11, 0)], dtype=object)
from itertools import groupby
from operator import itemgetter
data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
for k, g in groupby(enumerate(data), lambda (i, x): (i-x).total_seconds()/3600):
print map(itemgetter(1), g)
The error is:
for k, g in groupby(enumerate(data), lambda (i, x): int((i-x).total_seconds()/3600)):
TypeError: unsupported operand type(s) for -: 'int' and 'datetime.datetime'
There are lot of solutions on the web but I want to apply this particular one for learning.
If you want to get all subsequences of items such that each item is an hour later than the previous one (not clusters of items that each are within an hour from eachother), you need to iterate over pairs (data[i-1], data[i])
. Currently, you are just iterating over (i, data[i])
which raises TypeError
when you try to substract data[i]
from i
. A working example could look like this:
from itertools import izip
def find_subsequences(data):
if len(data) <= 1:
return []
current_group = [data[0]]
delta = 3600
results = []
for current, next in izip(data, data[1:]):
if abs((next - current).total_seconds()) > delta:
# Here, `current` is the last item of the previous subsequence
# and `next` is the first item of the next subsequence.
if len(current_group) >= 2:
results.append(current_group)
current_group = [next]
continue
current_group.append(next)
return results
Let’s import datetime, and take out the elipsis from your data, and then apply a lambda function with two nested loops to calculate elapsed time between any two dates lower than one hour… a boolean matrix will identify the desired clusters easily.
from datetime import datetime as dt
data = np.array([dt(2016, 10, 1, 8, 0),
dt(2016, 10, 1, 9, 0),
dt(2016, 10, 1, 10, 0),
dt(2019, 1, 3, 9, 0),
dt(2019, 1, 3, 10, 0),
dt(2019, 1, 3, 11, 0)], dtype=object)
mds = lambda ds: [[abs(da-db).seconds/3600 <= 1 for da in ds] for db in ds]
Appling the function to data:
md = mds(data)
md will give us:
[[True, True, False, True, False, False],
[True, True, True, True, True, False],
[False, True, True, False, True, True],
[True, True, False, True, True, False],
[False, True, True, True, True, True],
[False, False, True, False, True, True]]
Note that the main diagonal is True
(Deltatime is zero), and the matrix is symmetrical. True elements are those where abs(date[i] - date[j])
is lower or equal to one hour, i and j between 0 and 5 indicates each pair of dates are considerated at the matrix.