pyarrow: how to save casted column's values in same table?

Question:

i’m beginner in pyarrow and trying to cast my timestamp with AM/PM prefix.

I have a column [‘Datetime’] with such values:

   "2021/07/25 12:00:00 AM",
   "2022/06/28 11:58:00 PM",
   "2022/03/11 10:30:00 AM",

and i’m trying to get these:

2021-07-25 12:00:00,
2022-06-28 11:58:00,
2022-03-11 10:30:00,

Ideally, want make this transformation in pyarrow.csv.read_csv something like that:

table = csv.read_csv('my_data.csv',
                     convert_options=csv.ConvertOptions(
    column_types={
        'Datetime': pa.timestamp[s],
                 }
                                                        )
                    )

and after that write this table to parquet.

At the same time i know how to convert this separate from my table

pc.strptime(table.column("Incident Datetime"), format='%Y/%m/%d %H:%M:%S %p', unit='s')

But i don’t understand how to cast this changes to my table.

Asked By: Illia Kaltovich

||

Answers:

Right now you can do that with set_column method. See cookbook here: https://arrow.apache.org/cookbook/py/data.html#replacing-a-column-in-an-existing-table.

new_incident_datetime = pc.strptime(table.column("Incident Datetime"), format='%Y/%m/%d %H:%M:%S %p', unit='s')
column_idx = 1 # Or whatever your column index happens to be.

sales_data.set_column(
  column_idx,
  "Incident Datetime",
  pa.array([30, 20, 15, 40])
)
Answered By: Rok
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.