How to change year value in numpy datetime64?

Question:

I have a pandas DataFrame with dtype=numpy.datetime64
In the data I want to change

'2011-11-14T00:00:00.000000000'

to:

'2010-11-14T00:00:00.000000000'

or other year. Timedelta is not known, only year number to assign.
this displays year in int

Dates_profit.iloc[50][stock].astype('datetime64[Y]').astype(int)+1970

but can’t assign value.
Anyone know how to assign year to numpy.datetime64?

Asked By: Mark

||

Answers:

Since you’re using a DataFrame, consider using pandas.Timestamp.replace:

In [1]: import pandas as pd

In [2]: dates = pd.DatetimeIndex([f'200{i}-0{i+1}-0{i+1}' for i in range(5)])

In [3]: df = pd.DataFrame({'Date': dates})

In [4]: df
Out[4]:
        Date
0 2000-01-01
1 2001-02-02
2 2002-03-03
3 2003-04-04
4 2004-05-05

In [5]: df.loc[:, 'Date'] = df['Date'].apply(lambda x: x.replace(year=1999))

In [6]: df
Out[6]:
        Date
0 1999-01-01
1 1999-02-02
2 1999-03-03
3 1999-04-04
4 1999-05-05

numpy.datetime64 objects are hard to work with. To update a value, it is normally easier to convert the date to a standard Python datetime object, do the change and then convert it back to a numpy.datetime64 value again:

import numpy as np
from datetime import datetime

dt64 = np.datetime64('2011-11-14T00:00:00.000000000')

# convert to timestamp:
ts = (dt64 - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's')

# standard utctime from timestamp
dt = datetime.utcfromtimestamp(ts)

# get the new updated year
dt = dt.replace(year=2010)

# convert back to numpy.datetime64:
dt64 = np.datetime64(dt)

There might be simpler ways, but this works, at least.

Answered By: JohanL

This vectorised solution gives the same result as using pandas to iterate over with x.replace(year=n), but the speed up on large arrays is at least x10 faster.

It is important to remember the year that the datetime64 object is replaced with should be a leap year. Using the python datetime library, the following crashes: datetime(2012,2,29).replace(year=2011) crashes. Here, the function ‘replace_year’ will simply move 2012-02-29 to 2011-03-01.

I’m using numpy v 1.13.1.

import numpy as np
import pandas as pd

def replace_year(x, year):
    """ Year must be a leap year for this to work """
    # Add number of days x is from JAN-01 to year-01-01 
    x_year = np.datetime64(str(year)+'-01-01') +  (x - x.astype('M8[Y]'))

    # Due to leap years calculate offset of 1 day for those days in non-leap year
    yr_mn = x.astype('M8[Y]') + np.timedelta64(59,'D')
    leap_day_offset = (yr_mn.astype('M8[M]') - yr_mn.astype('M8[Y]') - 1).astype(np.int)

    # However, due to days in non-leap years prior March-01, 
    # correct for previous step by removing an extra day
    non_leap_yr_beforeMarch1 = (x.astype('M8[D]') - x.astype('M8[Y]')).astype(np.int) < 59
    non_leap_yr_beforeMarch1 = np.logical_and(non_leap_yr_beforeMarch1, leap_day_offset).astype(np.int)
    day_offset = np.datetime64('1970') - (leap_day_offset - non_leap_yr_beforeMarch1).astype('M8[D]')

    # Finally, apply the day offset 
    x_year = x_year - day_offset
    return x_year


x = np.arange('2012-01-01', '2014-01-01', dtype='datetime64[h]')
x_datetime = pd.to_datetime(x)

x_year = replace_year(x, 1992)
x_datetime = x_datetime.map(lambda x: x.replace(year=1992))

print(x)
print(x_year)
print(x_datetime)
print(np.all(x_datetime.values == x_year))
Answered By: KieranL
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.