change multiple columns in pandas dataframe to datetime

Question:

I have a dataframe of 13 columns and 55,000 rows I am trying to convert 5 of those rows to datetime, right now they are returning the type ‘object’ and I need to transform this data for machine learning I know that if I do

data['birth_date'] = pd.to_datetime(data[birth_date], errors ='coerce')

it will return a datetime column but I want to do it for 4 other columns as well, is there one line that I can write to call all of them? I dont think I can index like

data[:,7:12]

thanks!

Asked By: kwashington122

||

Answers:

You can use apply to iterate through each column using pd.to_datetime

data.iloc[:, 7:12] = data.iloc[:, 7:12].apply(pd.to_datetime, errors='coerce')

As part of the changes in pandas 1.3.0, iloc/loc will no longer update the column dtype on assignment. Use column labels directly instead:

cols = data.columns[7:12]
data[cols] = data[cols].apply(pd.to_datetime, errors='coerce')
Answered By: Ted Petrou

If performance is a concern I would advice to use the following function to convert those columns to date_time:

def lookup(s):
    """
    This is an extremely fast approach to datetime parsing.
    For large data, the same dates are often repeated. Rather than
    re-parse these, we store all unique dates, parse them, and
    use a lookup to convert all dates.
    """
    dates = {date:pd.to_datetime(date) for date in s.unique()}
    return s.apply(lambda v: dates[v])

to_datetime: 5799 ms
dateutil:    5162 ms
strptime:    1651 ms
manual:       242 ms
lookup:        32 ms

Source:
https://github.com/sanand0/benchmarks/tree/master/date-parse

Answered By: SerialDev

First you need to extract all the columns your interested in from data then you can use pandas applymap to apply to_datetime to each element in the extracted frame, I assume you know the index of the columns you want to extract, In the code below column names of the third to the sixteenth columns are extracted. you can alternatively define a list and add the names of the columns to it and use that in place, you may also need to pass the date/time format of the the DateTime entries

import pandas as pd

cols_2_extract = data.columns[2:15]

data[cols_2_extract] = data[cols_2_extract].applymap(lambda x : pd.to_datetime(x, format = '%d %M %Y'))
Answered By: sgDysregulation
my_df[['column1','column2']] =     
my_df[['column1','column2']].apply(pd.to_datetime, format='%Y-%m-%d %H:%M:%S.%f')

Note: of course the format can be changed as required.

Answered By: mel el

If you rather want to convert at load time, you could do something like this

date_columns = ['c1','c2', 'c3', 'c4', 'c5']
data = pd.read_csv('file_to_read.csv', parse_dates=date_columns)
Answered By: smishra
data.iloc[:, 7:12]=data.iloc[:, 7:12].astype('datetime64[ns]')
Answered By: Manuj Arora

Slightly different from the accepted answer, loc also works:

dx.loc[:,['birth_date','death_date']] = dx.loc[:,['birth_date','death_date']].apply(pd.to_datetime, errors='coerce')
Answered By: ChrisDanger

read_csv()

Adding to @smishra answer. When importing .csv you can infer dates using infer-datetime-format as discussed here. This can only be used if the series has a consistent date format but will speedup the import of dates.

read_excel()

There is also the read_excel() function that can be used to import and process dates. You can pass the parse_dates parameter a list of columns name or numbers.

parse_dates = [7,8,9,10,11]
data = pd.read_excel('file_to_read.csv', sheet_name='Sheet1', parse_dates=parse_dates)
Answered By: Cam
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.