Creating numpy linspace out of datetime
Question:
I’m writing a script that plots some data with dates on the x axis (in matplotlib). I need to create a numpy.linspace
out of those dates in order to create a spline afterwards. Is it possible to do that?
What I’ve tried:
import datetime
import numpy as np
dates = [
datetime.datetime(2015, 7, 2, 0, 31, 41),
datetime.datetime(2015, 7, 2, 1, 35),
datetime.datetime(2015, 7, 2, 2, 37, 9),
datetime.datetime(2015, 7, 2, 3, 59, 16),
datetime.datetime(2015, 7, 2, 5, 2, 23)
]
x = np.linspace(min(dates), max(dates), 500)
It throws this error:
TypeError: unsupported operand type(s) for *: 'datetime.datetime' and 'float'
I’ve also tried converting datetime
to np.datetime64
, but that doesn’t work as well:
dates = [np.datetime64(i) for i in dates]
x = np.linspace(min(dates), max(dates), 500)
Error:
TypeError: ufunc multiply cannot use operands with types dtype('<M8[us]') and dtype('float64')
Answers:
As far as I know, np.linspace does not support datetime objects. But perhaps we can make our own function which roughly simulates it:
def date_linspace(start, end, steps):
delta = (end - start) / steps
increments = range(0, steps) * np.array([delta]*steps)
return start + increments
This should give you an np.array with dates going from start
to end
in steps
steps (not including the end date, can be easily modified).
The last error is telling us that np.datetime
objects cannot multiply. Addition has been defined – you can add n
timesteps to a date and get another date. But it doesn’t make any sense to multiply a date.
In [1238]: x=np.array([1000],dtype='datetime64[s]')
In [1239]: x
Out[1239]: array(['1970-01-01T00:16:40'], dtype='datetime64[s]')
In [1240]: x[0]*3
...
TypeError: ufunc multiply cannot use operands with types dtype('<M8[s]') and dtype('int32')
So the simple way to generate a range of datetime objects is to add range of timesteps. Here, for example, I’m using 10 second increments
In [1241]: x[0]+np.arange(0,60,10)
Out[1241]:
array(['1970-01-01T00:16:40', '1970-01-01T00:16:50', '1970-01-01T00:17:00',
'1970-01-01T00:17:10', '1970-01-01T00:17:20', '1970-01-01T00:17:30'], dtype='datetime64[s]')
The error in linspace
is the result of it trying to multiply the start
by 1.
, as seen in the full error stack:
In [1244]: np.linspace(x[0],x[-1],10)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1244-6e50603c0c4e> in <module>()
----> 1 np.linspace(x[0],x[-1],10)
/usr/lib/python3/dist-packages/numpy/core/function_base.py in linspace(start, stop, num, endpoint, retstep, dtype)
88
89 # Convert float/complex array scalars to float, gh-3504
---> 90 start = start * 1.
91 stop = stop * 1.
92
TypeError: ufunc multiply cannot use operands with types dtype('<M8[s]') and dtype('float64')
Despite the comment it looks like it’s just converting ints to float. Anyways it wasn’t written with datetime64
objects in mind.
user89161's
is the way to go if you want to use the linspace
syntax, otherwise you can just add the increments of your choosen size to the start date.
arange
works with these dates:
In [1256]: np.arange(x[0],x[0]+60,10)
Out[1256]:
array(['1970-01-01T00:16:40', '1970-01-01T00:16:50', '1970-01-01T00:17:00',
'1970-01-01T00:17:10', '1970-01-01T00:17:20', '1970-01-01T00:17:30'], dtype='datetime64[s]')
Update – 2022
As pointed out by @Joooeey and @Ehtesh Choudhury, pandas
now has date_range
, which makes creating numpy.linspace
-like time series much simpler.
t = pd.date_range(start='2022-03-10',
end='2022-03-15',
periods=5)
If it’s important to have this time series as a numpy
array, simply
>>> t.values
array(['2022-03-10T00:00:00.000000000', '2022-03-11T06:00:00.000000000',
'2022-03-12T12:00:00.000000000', '2022-03-13T18:00:00.000000000',
'2022-03-15T00:00:00.000000000'], dtype='datetime64[ns]')
Original answer
Have you considered using pandas
? Using an approach from this possible duplicate question, you can make use of np.linspace
in the following way
import pandas as pd
start = pd.Timestamp('2015-07-01')
end = pd.Timestamp('2015-08-01')
t = np.linspace(start.value, end.value, 100)
t = pd.to_datetime(t)
To obtain an np.array
of the linear timeseries
In [3]: np.asarray(t)
Out[3]:
array(['2015-06-30T17:00:00.000000000-0700',
'2015-07-01T00:30:54.545454592-0700',
'2015-07-01T08:01:49.090909184-0700',
...
'2015-07-31T01:58:10.909090816-0700',
'2015-07-31T09:29:05.454545408-0700',
'2015-07-31T17:00:00.000000000-0700'], dtype='datetime64[ns]')
As of pandas 0.23 you can use date_range:
import pandas as pd
x = pd.date_range(min(dates), max(dates), periods=500).to_pydatetime()
import numpy # 1.15
start = numpy.datetime64('2001-01-01')
end = numpy.datetime64('2019-01-01')
# Linspace in days:
days = numpy.linspace(start.astype('f8'), end.astype('f8'), dtype='<M8[D]')
# Linspace in milliseconds
MS1D = 24 * 60 * 60 * 1000
daytimes = numpy.linspace(start.astype('f8') * MS1D, end.astype('f8') * MS1D, dtype='<M8[ms]')
I’m writing a script that plots some data with dates on the x axis (in matplotlib). I need to create a numpy.linspace
out of those dates in order to create a spline afterwards. Is it possible to do that?
What I’ve tried:
import datetime
import numpy as np
dates = [
datetime.datetime(2015, 7, 2, 0, 31, 41),
datetime.datetime(2015, 7, 2, 1, 35),
datetime.datetime(2015, 7, 2, 2, 37, 9),
datetime.datetime(2015, 7, 2, 3, 59, 16),
datetime.datetime(2015, 7, 2, 5, 2, 23)
]
x = np.linspace(min(dates), max(dates), 500)
It throws this error:
TypeError: unsupported operand type(s) for *: 'datetime.datetime' and 'float'
I’ve also tried converting datetime
to np.datetime64
, but that doesn’t work as well:
dates = [np.datetime64(i) for i in dates]
x = np.linspace(min(dates), max(dates), 500)
Error:
TypeError: ufunc multiply cannot use operands with types dtype('<M8[us]') and dtype('float64')
As far as I know, np.linspace does not support datetime objects. But perhaps we can make our own function which roughly simulates it:
def date_linspace(start, end, steps):
delta = (end - start) / steps
increments = range(0, steps) * np.array([delta]*steps)
return start + increments
This should give you an np.array with dates going from start
to end
in steps
steps (not including the end date, can be easily modified).
The last error is telling us that np.datetime
objects cannot multiply. Addition has been defined – you can add n
timesteps to a date and get another date. But it doesn’t make any sense to multiply a date.
In [1238]: x=np.array([1000],dtype='datetime64[s]')
In [1239]: x
Out[1239]: array(['1970-01-01T00:16:40'], dtype='datetime64[s]')
In [1240]: x[0]*3
...
TypeError: ufunc multiply cannot use operands with types dtype('<M8[s]') and dtype('int32')
So the simple way to generate a range of datetime objects is to add range of timesteps. Here, for example, I’m using 10 second increments
In [1241]: x[0]+np.arange(0,60,10)
Out[1241]:
array(['1970-01-01T00:16:40', '1970-01-01T00:16:50', '1970-01-01T00:17:00',
'1970-01-01T00:17:10', '1970-01-01T00:17:20', '1970-01-01T00:17:30'], dtype='datetime64[s]')
The error in linspace
is the result of it trying to multiply the start
by 1.
, as seen in the full error stack:
In [1244]: np.linspace(x[0],x[-1],10)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1244-6e50603c0c4e> in <module>()
----> 1 np.linspace(x[0],x[-1],10)
/usr/lib/python3/dist-packages/numpy/core/function_base.py in linspace(start, stop, num, endpoint, retstep, dtype)
88
89 # Convert float/complex array scalars to float, gh-3504
---> 90 start = start * 1.
91 stop = stop * 1.
92
TypeError: ufunc multiply cannot use operands with types dtype('<M8[s]') and dtype('float64')
Despite the comment it looks like it’s just converting ints to float. Anyways it wasn’t written with datetime64
objects in mind.
user89161's
is the way to go if you want to use the linspace
syntax, otherwise you can just add the increments of your choosen size to the start date.
arange
works with these dates:
In [1256]: np.arange(x[0],x[0]+60,10)
Out[1256]:
array(['1970-01-01T00:16:40', '1970-01-01T00:16:50', '1970-01-01T00:17:00',
'1970-01-01T00:17:10', '1970-01-01T00:17:20', '1970-01-01T00:17:30'], dtype='datetime64[s]')
Update – 2022
As pointed out by @Joooeey and @Ehtesh Choudhury, pandas
now has date_range
, which makes creating numpy.linspace
-like time series much simpler.
t = pd.date_range(start='2022-03-10',
end='2022-03-15',
periods=5)
If it’s important to have this time series as a numpy
array, simply
>>> t.values
array(['2022-03-10T00:00:00.000000000', '2022-03-11T06:00:00.000000000',
'2022-03-12T12:00:00.000000000', '2022-03-13T18:00:00.000000000',
'2022-03-15T00:00:00.000000000'], dtype='datetime64[ns]')
Original answer
Have you considered using pandas
? Using an approach from this possible duplicate question, you can make use of np.linspace
in the following way
import pandas as pd
start = pd.Timestamp('2015-07-01')
end = pd.Timestamp('2015-08-01')
t = np.linspace(start.value, end.value, 100)
t = pd.to_datetime(t)
To obtain an np.array
of the linear timeseries
In [3]: np.asarray(t)
Out[3]:
array(['2015-06-30T17:00:00.000000000-0700',
'2015-07-01T00:30:54.545454592-0700',
'2015-07-01T08:01:49.090909184-0700',
...
'2015-07-31T01:58:10.909090816-0700',
'2015-07-31T09:29:05.454545408-0700',
'2015-07-31T17:00:00.000000000-0700'], dtype='datetime64[ns]')
As of pandas 0.23 you can use date_range:
import pandas as pd
x = pd.date_range(min(dates), max(dates), periods=500).to_pydatetime()
import numpy # 1.15
start = numpy.datetime64('2001-01-01')
end = numpy.datetime64('2019-01-01')
# Linspace in days:
days = numpy.linspace(start.astype('f8'), end.astype('f8'), dtype='<M8[D]')
# Linspace in milliseconds
MS1D = 24 * 60 * 60 * 1000
daytimes = numpy.linspace(start.astype('f8') * MS1D, end.astype('f8') * MS1D, dtype='<M8[ms]')