Pandas time stamp column column having issues reading to redshift

Question:

I have a dataframe that looks like this

d = {'Timestamp': ['Nov 16 10:39:54', 'Nov 16 10:39:54', 'Nov 16 10:39:54', 'Nov 16 10:39:54', 'Nov 16 10:40:17']}
df_sample = pd.DataFrame(data=d)
df_sample.head()

Redshift seems to throw an error when I try to load this into a table. I get this error

ProgrammingError: {'S': 'ERROR', 'C': '42601', 'M': 'syntax error at or near "Full"', 'P': '88', 'F': '/home/ec2-user/padb/src/pg/src/backend/parser/parser_scan.l', 'L': '732', 'R': 'yyerror'}

It could be a different column but nonetheless how would I convert this to a more normal datatime?

Asked By: Wolfy

||

Answers:

You want

df_sample["iso8601"] = pd.to_datetime(
    "2022 " + df_sample.Timestamp, format="%Y %b %d %H:%M:%S"
)
print(df_sample.tail(3).set_index("iso8601"))

output

                           Timestamp
iso8601                             
2022-11-16 10:39:54  Nov 16 10:39:54
2022-11-16 10:39:54  Nov 16 10:39:54
2022-11-16 10:40:17  Nov 16 10:40:17

Take care to treat these as UTC timestamps,
rather than times in some local timezone,
as there is no zone information bundled
along with that data.

Answered By: J_H