How to change 00YY year format to 2022 in Python?
Question:
Snowflake csv upload turned the date format into "0022-10-02". Is there a best way to change the "0022" format to "2022" format as NEW_DATE
shows?
ORDER_DATE
NEW_DATE
0022-10-02
2022-10-02
0022-10-02
2022-10-03
I’ve tried:
df["ORDER_DATE"] = pd.to_datetime(df["ORDER_DATE"], format='%YYYY%mm%dd', errors='ignore')
def swap(x): return re.sub("^00", "20", x) if type(x) is str else x
df.applymap(swap)
print(df)
This still shows "0022" in the ORDER_DATE
column.
I’ve also tried:
df["ORDER_DATE"] = pd.DataFrame({"ORDER_DATE": ["10-02-22", "10-02-22"] })
df.apply(lambda x:x.replace("^00", "20", regex=True))
print(df)
Output changes the dates in the ORDER_DATE
column but it’s not ideal, because I’d like to be able to automate the date conversion process from Snowflake.
Answers:
There’s a couple of things that need fixing. First, it’s best to fix the dates before trying to convert them. Second, df.applymap()
returns a modified DataFrame, and you aren’t saving the result. Third, the format string for the date isn’t quite right, but you can actually let to_datetime()
figure it out with the yearfirst=True
argument.
This should work to create a NEW_DATE
column with a corrected date:
def swap(x): return re.sub("^00", "20", x) if type(x) is str else x
df['NEW_DATE'] = df['ORDER_DATE'].apply(swap)
df['NEW_DATE'] = pd.to_datetime(df['NEW_DATE'], yearfirst=True)
As an example to show that the new column has the correct values and types:
>>> df = pd.DataFrame({"ORDER_DATE": ["0022-10-02", "0022-11-03"]})
>>> df
ORDER_DATE
0 0022-10-02
1 0022-11-03
>>> df.dtypes
ORDER_DATE object
dtype: object
>>> df['NEW_DATE'] = df['ORDER_DATE'].apply(swap)
>>> df['NEW_DATE'] = pd.to_datetime(df['NEW_DATE'], yearfirst=True)
>>> df
ORDER_DATE NEW_DATE
0 0022-10-02 2022-10-02
1 0022-11-03 2022-11-03
>>> df.dtypes
ORDER_DATE object
NEW_DATE datetime64[ns]
dtype: object
You can try str.replace
df['NEW_DATE'] = df['ORDER_DATE'].str.replace(r'^(00)(22.*)$', r'202', regex=True)
print(df)
ORDER_DATE NEW_DATE
0 0022-10-02 2022-10-02
1 0022-10-02 2022-10-02
Snowflake csv upload turned the date format into "0022-10-02". Is there a best way to change the "0022" format to "2022" format as NEW_DATE
shows?
ORDER_DATE | NEW_DATE |
---|---|
0022-10-02 | 2022-10-02 |
0022-10-02 | 2022-10-03 |
I’ve tried:
df["ORDER_DATE"] = pd.to_datetime(df["ORDER_DATE"], format='%YYYY%mm%dd', errors='ignore')
def swap(x): return re.sub("^00", "20", x) if type(x) is str else x
df.applymap(swap)
print(df)
This still shows "0022" in the ORDER_DATE
column.
I’ve also tried:
df["ORDER_DATE"] = pd.DataFrame({"ORDER_DATE": ["10-02-22", "10-02-22"] })
df.apply(lambda x:x.replace("^00", "20", regex=True))
print(df)
Output changes the dates in the ORDER_DATE
column but it’s not ideal, because I’d like to be able to automate the date conversion process from Snowflake.
There’s a couple of things that need fixing. First, it’s best to fix the dates before trying to convert them. Second, df.applymap()
returns a modified DataFrame, and you aren’t saving the result. Third, the format string for the date isn’t quite right, but you can actually let to_datetime()
figure it out with the yearfirst=True
argument.
This should work to create a NEW_DATE
column with a corrected date:
def swap(x): return re.sub("^00", "20", x) if type(x) is str else x
df['NEW_DATE'] = df['ORDER_DATE'].apply(swap)
df['NEW_DATE'] = pd.to_datetime(df['NEW_DATE'], yearfirst=True)
As an example to show that the new column has the correct values and types:
>>> df = pd.DataFrame({"ORDER_DATE": ["0022-10-02", "0022-11-03"]})
>>> df
ORDER_DATE
0 0022-10-02
1 0022-11-03
>>> df.dtypes
ORDER_DATE object
dtype: object
>>> df['NEW_DATE'] = df['ORDER_DATE'].apply(swap)
>>> df['NEW_DATE'] = pd.to_datetime(df['NEW_DATE'], yearfirst=True)
>>> df
ORDER_DATE NEW_DATE
0 0022-10-02 2022-10-02
1 0022-11-03 2022-11-03
>>> df.dtypes
ORDER_DATE object
NEW_DATE datetime64[ns]
dtype: object
You can try str.replace
df['NEW_DATE'] = df['ORDER_DATE'].str.replace(r'^(00)(22.*)$', r'202', regex=True)
print(df)
ORDER_DATE NEW_DATE
0 0022-10-02 2022-10-02
1 0022-10-02 2022-10-02