How to change 00YY year format to 2022 in Python?

Question:

Snowflake csv upload turned the date format into "0022-10-02". Is there a best way to change the "0022" format to "2022" format as NEW_DATE shows?

ORDER_DATE NEW_DATE
0022-10-02 2022-10-02
0022-10-02 2022-10-03

I’ve tried:

df["ORDER_DATE"] = pd.to_datetime(df["ORDER_DATE"], format='%YYYY%mm%dd', errors='ignore')
def swap(x): return re.sub("^00", "20", x) if type(x) is str else x
df.applymap(swap)
print(df)

This still shows "0022" in the ORDER_DATE column.

I’ve also tried:

df["ORDER_DATE"] = pd.DataFrame({"ORDER_DATE": ["10-02-22", "10-02-22"] })
df.apply(lambda x:x.replace("^00", "20", regex=True))
print(df)

Output changes the dates in the ORDER_DATE column but it’s not ideal, because I’d like to be able to automate the date conversion process from Snowflake.

Asked By: mikestacker487

||

Answers:

There’s a couple of things that need fixing. First, it’s best to fix the dates before trying to convert them. Second, df.applymap() returns a modified DataFrame, and you aren’t saving the result. Third, the format string for the date isn’t quite right, but you can actually let to_datetime() figure it out with the yearfirst=True argument.

This should work to create a NEW_DATE column with a corrected date:

def swap(x): return re.sub("^00", "20", x) if type(x) is str else x
df['NEW_DATE'] = df['ORDER_DATE'].apply(swap)
df['NEW_DATE'] = pd.to_datetime(df['NEW_DATE'], yearfirst=True)

As an example to show that the new column has the correct values and types:

>>> df = pd.DataFrame({"ORDER_DATE": ["0022-10-02", "0022-11-03"]})
>>> df
   ORDER_DATE
0  0022-10-02
1  0022-11-03
>>> df.dtypes
ORDER_DATE    object
dtype: object

>>> df['NEW_DATE'] = df['ORDER_DATE'].apply(swap)
>>> df['NEW_DATE'] = pd.to_datetime(df['NEW_DATE'], yearfirst=True)
>>> df
   ORDER_DATE   NEW_DATE
0  0022-10-02 2022-10-02
1  0022-11-03 2022-11-03
>>> df.dtypes
ORDER_DATE            object
NEW_DATE      datetime64[ns]
dtype: object
Answered By: sj95126

You can try str.replace

df['NEW_DATE'] = df['ORDER_DATE'].str.replace(r'^(00)(22.*)$', r'202', regex=True)
print(df)

   ORDER_DATE    NEW_DATE
0  0022-10-02  2022-10-02
1  0022-10-02  2022-10-02
Answered By: Ynjxsjmh
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.