Combine Date and Time columns using pandas
Question:
I have a pandas dataframe with the following columns:
data = {'Date': ['01-06-2013', '02-06-2013', '02-06-2013', '02-06-2013', '02-06-2013', '03-06-2013', '03-06-2013', '03-06-2013', '03-06-2013', '04-06-2013'],
'Time': ['23:00:00', '01:00:00', '21:00:00', '22:00:00', '23:00:00', '01:00:00', '21:00:00', '22:00:00', '23:00:00', '01:00:00']}
df = pd.DataFrame(data)
Date Time
0 01-06-2013 23:00:00
1 02-06-2013 01:00:00
2 02-06-2013 21:00:00
3 02-06-2013 22:00:00
4 02-06-2013 23:00:00
5 03-06-2013 01:00:00
6 03-06-2013 21:00:00
7 03-06-2013 22:00:00
8 03-06-2013 23:00:00
9 04-06-2013 01:00:00
How do I combine data[‘Date’] & data[‘Time’] to get the following? Is there a way of doing it using pd.to_datetime
?
Date
01-06-2013 23:00:00
02-06-2013 01:00:00
02-06-2013 21:00:00
02-06-2013 22:00:00
02-06-2013 23:00:00
03-06-2013 01:00:00
03-06-2013 21:00:00
03-06-2013 22:00:00
03-06-2013 23:00:00
04-06-2013 01:00:00
Answers:
It’s worth mentioning that you may have been able to read this in directly e.g. if you were using read_csv
using parse_dates=[['Date', 'Time']]
.
Assuming these are just strings you could simply add them together (with a space), allowing you to use to_datetime
, which works without specifying the format=
parameter
In [11]: df['Date'] + ' ' + df['Time']
Out[11]:
0 01-06-2013 23:00:00
1 02-06-2013 01:00:00
2 02-06-2013 21:00:00
3 02-06-2013 22:00:00
4 02-06-2013 23:00:00
5 03-06-2013 01:00:00
6 03-06-2013 21:00:00
7 03-06-2013 22:00:00
8 03-06-2013 23:00:00
9 04-06-2013 01:00:00
dtype: object
In [12]: pd.to_datetime(df['Date'] + ' ' + df['Time'])
Out[12]:
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
dtype: datetime64[ns]
Alternatively, without the + ' '
, but the format=
parameter must be used. Additionally, pandas is good at inferring the format to be converted to a datetime
, however, specifying the exact format is faster.
pd.to_datetime(df['Date'] + df['Time'], format='%m-%d-%Y%H:%M:%S')
Note: surprisingly (for me), this works fine with NaNs being converted to NaT, but it is worth worrying that the conversion (perhaps using the raise
argument).
%%timeit
# sample dataframe with 10000000 rows using df from the OP
df = pd.concat([df for _ in range(1000000)]).reset_index(drop=True)
%%timeit
pd.to_datetime(df['Date'] + ' ' + df['Time'])
[result]:
1.73 s ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
pd.to_datetime(df['Date'] + df['Time'], format='%m-%d-%Y%H:%M:%S')
[result]:
1.33 s ± 9.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The accepted answer works for columns that are of datatype string
. For completeness: I come across this question when searching how to do this when the columns are of datatypes: date and time.
df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']),1)
I don’t have enough reputation to comment on jka.ne so:
I had to amend jka.ne’s line for it to work:
df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']).time(),1)
This might help others.
Also, I have tested a different approach, using replace
instead of combine
:
def combine_date_time(df, datecol, timecol):
return df.apply(lambda row: row[datecol].replace(
hour=row[timecol].hour,
minute=row[timecol].minute),
axis=1)
which in the OP’s case would be:
combine_date_time(df, 'Date', 'Time')
I have timed both approaches for a relatively large dataset (>500.000 rows), and they both have similar runtimes, but using combine
is faster (59s for replace
vs 50s for combine
).
You can use this to merge date and time into the same column of dataframe.
import pandas as pd
data_file = 'data.csv' #path of your file
Reading .csv file with merged columns Date_Time:
data = pd.read_csv(data_file, parse_dates=[['Date', 'Time']])
You can use this line to keep both other columns also.
data.set_index(['Date', 'Time'], drop=False)
Cast the columns if the types are different (datetime
and timestamp
or str
) and use to_datetime
:
df.loc[:,'Date'] = pd.to_datetime(df.Date.astype(str)+' '+df.Time.astype(str))
Result :
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
Best,
The answer really depends on what your column types are. In my case, I had datetime
and timedelta
.
> df[['Date','Time']].dtypes
Date datetime64[ns]
Time timedelta64[ns]
If this is your case, then you just need to add the columns:
> df['Date'] + df['Time']
You can also convert to datetime
without string concatenation, by combining to_datetime
and to_timedelta
, which create datetime
and timedeltea
objects, respectively. Combined with pd.DataFrame.pop
, you can remove the source Series simultaneously:
df['DateTime'] = pd.to_datetime(df.pop('Date')) + pd.to_timedelta(df.pop('Time'))
print(df)
DateTime
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
print(df.dtypes)
DateTime datetime64[ns]
dtype: object
First make sure to have the right data types:
df["Date"] = pd.to_datetime(df["Date"])
df["Time"] = pd.to_timedelta(df["Time"])
Then you easily combine them:
df["DateTime"] = df["Date"] + df["Time"]
Use the combine
function:
datetime.datetime.combine(date, time)
My dataset had 1second resolution data for a few days and parsing by the suggested methods here was very slow. Instead I used:
dates = pandas.to_datetime(df.Date, cache=True)
times = pandas.to_timedelta(df.Time)
datetimes = dates + times
Note the use of cache=True
makes parsing the dates very efficient since there are only a couple unique dates in my files, which is not true for a combined date and time column.
DATA:
<TICKER>,<PER>,<DATE>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>
SPFB.RTS,1,20190103,100100,106580.0000000,107260.0000000,106570.0000000,107230.0000000,3726
CODE:
data.columns = ['ticker', 'per', 'date', 'time', 'open', 'high', 'low', 'close', 'vol']
data.datetime = pd.to_datetime(data.date.astype(str) + ' ' + data.time.astype(str), format='%Y%m%d %H%M%S')
Here is a one liner, to do it. You simply concatenate the two string in each of the column with a " " space in between.
Say df is your dataframe and columns are ‘Time’ and ‘Date’. And your new column is DateAndTime.
df['DateAndTime'] = df['Date'].str.cat(df['Time'],sep=" ")
And if you also wanna handle entries like datetime objects, you can do this. You can tweak the formatting as per your needs.
df['DateAndTime'] = pd.to_datetime(df['DateAndTime'], format="%m/%d/%Y %I:%M:%S %p")
Cheers!! Happy Data Crunching.
I think the best solution is to parse dates within read_csv
(or other read_ functions) directly. It is not obvious how to manage two columns in date_parser but here it is:
date_parser = lambda x,y: datetime.strptime(f"{x}T{y}", "%d-%m-%YT%H:%M:%S")
date = pd.read_csv('data.csv', parse_dates={'date': ['Date', 'Time']}, date_parser=date_parser)
I have a pandas dataframe with the following columns:
data = {'Date': ['01-06-2013', '02-06-2013', '02-06-2013', '02-06-2013', '02-06-2013', '03-06-2013', '03-06-2013', '03-06-2013', '03-06-2013', '04-06-2013'],
'Time': ['23:00:00', '01:00:00', '21:00:00', '22:00:00', '23:00:00', '01:00:00', '21:00:00', '22:00:00', '23:00:00', '01:00:00']}
df = pd.DataFrame(data)
Date Time
0 01-06-2013 23:00:00
1 02-06-2013 01:00:00
2 02-06-2013 21:00:00
3 02-06-2013 22:00:00
4 02-06-2013 23:00:00
5 03-06-2013 01:00:00
6 03-06-2013 21:00:00
7 03-06-2013 22:00:00
8 03-06-2013 23:00:00
9 04-06-2013 01:00:00
How do I combine data[‘Date’] & data[‘Time’] to get the following? Is there a way of doing it using pd.to_datetime
?
Date
01-06-2013 23:00:00
02-06-2013 01:00:00
02-06-2013 21:00:00
02-06-2013 22:00:00
02-06-2013 23:00:00
03-06-2013 01:00:00
03-06-2013 21:00:00
03-06-2013 22:00:00
03-06-2013 23:00:00
04-06-2013 01:00:00
It’s worth mentioning that you may have been able to read this in directly e.g. if you were using read_csv
using parse_dates=[['Date', 'Time']]
.
Assuming these are just strings you could simply add them together (with a space), allowing you to use to_datetime
, which works without specifying the format=
parameter
In [11]: df['Date'] + ' ' + df['Time']
Out[11]:
0 01-06-2013 23:00:00
1 02-06-2013 01:00:00
2 02-06-2013 21:00:00
3 02-06-2013 22:00:00
4 02-06-2013 23:00:00
5 03-06-2013 01:00:00
6 03-06-2013 21:00:00
7 03-06-2013 22:00:00
8 03-06-2013 23:00:00
9 04-06-2013 01:00:00
dtype: object
In [12]: pd.to_datetime(df['Date'] + ' ' + df['Time'])
Out[12]:
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
dtype: datetime64[ns]
Alternatively, without the + ' '
, but the format=
parameter must be used. Additionally, pandas is good at inferring the format to be converted to a datetime
, however, specifying the exact format is faster.
pd.to_datetime(df['Date'] + df['Time'], format='%m-%d-%Y%H:%M:%S')
Note: surprisingly (for me), this works fine with NaNs being converted to NaT, but it is worth worrying that the conversion (perhaps using the raise
argument).
%%timeit
# sample dataframe with 10000000 rows using df from the OP
df = pd.concat([df for _ in range(1000000)]).reset_index(drop=True)
%%timeit
pd.to_datetime(df['Date'] + ' ' + df['Time'])
[result]:
1.73 s ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
pd.to_datetime(df['Date'] + df['Time'], format='%m-%d-%Y%H:%M:%S')
[result]:
1.33 s ± 9.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The accepted answer works for columns that are of datatype string
. For completeness: I come across this question when searching how to do this when the columns are of datatypes: date and time.
df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']),1)
I don’t have enough reputation to comment on jka.ne so:
I had to amend jka.ne’s line for it to work:
df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']).time(),1)
This might help others.
Also, I have tested a different approach, using replace
instead of combine
:
def combine_date_time(df, datecol, timecol):
return df.apply(lambda row: row[datecol].replace(
hour=row[timecol].hour,
minute=row[timecol].minute),
axis=1)
which in the OP’s case would be:
combine_date_time(df, 'Date', 'Time')
I have timed both approaches for a relatively large dataset (>500.000 rows), and they both have similar runtimes, but using combine
is faster (59s for replace
vs 50s for combine
).
You can use this to merge date and time into the same column of dataframe.
import pandas as pd
data_file = 'data.csv' #path of your file
Reading .csv file with merged columns Date_Time:
data = pd.read_csv(data_file, parse_dates=[['Date', 'Time']])
You can use this line to keep both other columns also.
data.set_index(['Date', 'Time'], drop=False)
Cast the columns if the types are different (datetime
and timestamp
or str
) and use to_datetime
:
df.loc[:,'Date'] = pd.to_datetime(df.Date.astype(str)+' '+df.Time.astype(str))
Result :
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
Best,
The answer really depends on what your column types are. In my case, I had datetime
and timedelta
.
> df[['Date','Time']].dtypes
Date datetime64[ns]
Time timedelta64[ns]
If this is your case, then you just need to add the columns:
> df['Date'] + df['Time']
You can also convert to datetime
without string concatenation, by combining to_datetime
and to_timedelta
, which create datetime
and timedeltea
objects, respectively. Combined with pd.DataFrame.pop
, you can remove the source Series simultaneously:
df['DateTime'] = pd.to_datetime(df.pop('Date')) + pd.to_timedelta(df.pop('Time'))
print(df)
DateTime
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
print(df.dtypes)
DateTime datetime64[ns]
dtype: object
First make sure to have the right data types:
df["Date"] = pd.to_datetime(df["Date"])
df["Time"] = pd.to_timedelta(df["Time"])
Then you easily combine them:
df["DateTime"] = df["Date"] + df["Time"]
Use the combine
function:
datetime.datetime.combine(date, time)
My dataset had 1second resolution data for a few days and parsing by the suggested methods here was very slow. Instead I used:
dates = pandas.to_datetime(df.Date, cache=True)
times = pandas.to_timedelta(df.Time)
datetimes = dates + times
Note the use of cache=True
makes parsing the dates very efficient since there are only a couple unique dates in my files, which is not true for a combined date and time column.
DATA:
<TICKER>,<PER>,<DATE>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>
SPFB.RTS,1,20190103,100100,106580.0000000,107260.0000000,106570.0000000,107230.0000000,3726
CODE:
data.columns = ['ticker', 'per', 'date', 'time', 'open', 'high', 'low', 'close', 'vol']
data.datetime = pd.to_datetime(data.date.astype(str) + ' ' + data.time.astype(str), format='%Y%m%d %H%M%S')
Here is a one liner, to do it. You simply concatenate the two string in each of the column with a " " space in between.
Say df is your dataframe and columns are ‘Time’ and ‘Date’. And your new column is DateAndTime.
df['DateAndTime'] = df['Date'].str.cat(df['Time'],sep=" ")
And if you also wanna handle entries like datetime objects, you can do this. You can tweak the formatting as per your needs.
df['DateAndTime'] = pd.to_datetime(df['DateAndTime'], format="%m/%d/%Y %I:%M:%S %p")
Cheers!! Happy Data Crunching.
I think the best solution is to parse dates within read_csv
(or other read_ functions) directly. It is not obvious how to manage two columns in date_parser but here it is:
date_parser = lambda x,y: datetime.strptime(f"{x}T{y}", "%d-%m-%YT%H:%M:%S")
date = pd.read_csv('data.csv', parse_dates={'date': ['Date', 'Time']}, date_parser=date_parser)