Convert integer in format HHMMSS to datetime in Pandas
Question:
I have a long dataframe (35 million rows) of data that includes a Time column. The time in the original CSV is in the format HHMMSS, which translates into pandas as an integer. Being an integer, if the Hours are less than 2 digits the first digit drops off (ex: 090000 (9am) becomes 90000 in pandas).
I am trying to convert these integers into an actual datetime value that I can extract .time()
from, so as to be able to count in custom intervals of minutes, seconds, hours etc.
How do I convert an integer (such as 90000 or 100000) into their respective times (9am, 10am)?
Answers:
You can use to_datetime
after converting to string and zfill
ing the zeros:
df = pd.DataFrame({'time': [90000, 0, 123456]})
df['time2'] = pd.to_datetime(df['time'].astype(str).str.zfill(6), format='%H%M%S').dt.time
Or, as string:
df['time2'] = pd.to_datetime(df['time'].astype(str).str.zfill(6), format='%H%M%S').dt.strftime('%H:%M:%S')
Output:
time time2
0 90000 09:00:00
1 0 00:00:00
2 123456 12:34:56
You can convert the values to time (Timedelta) directly when loading through the converters
parameter. This parameter tells read_csv
how to convert specific columns. Unfortunately, there’s no equivalent to strptime
for timedelta.
One quick & dirty way to parse this format into a timedelta is to use a regular expression or just split the string into parts, eg :
def parse_timedelta(time_str):
s=time_str.zfill(6)
return pd.Timedelta(
hours=int(s[0:2]),
minutes=int(s[2:4]),
seconds=int(s[4:6]))
This can be used to convert fields in HHMMSS
format to timedelta:
csv="Timen90000n000000n123456"
df = pd.read_csv(StringIO(csv),converters={'Time':parse_time})
>>> df
Time
0 3 days 18:00:00
1 0 days 00:00:00
2 0 days 12:34:56
I have a long dataframe (35 million rows) of data that includes a Time column. The time in the original CSV is in the format HHMMSS, which translates into pandas as an integer. Being an integer, if the Hours are less than 2 digits the first digit drops off (ex: 090000 (9am) becomes 90000 in pandas).
I am trying to convert these integers into an actual datetime value that I can extract .time()
from, so as to be able to count in custom intervals of minutes, seconds, hours etc.
How do I convert an integer (such as 90000 or 100000) into their respective times (9am, 10am)?
You can use to_datetime
after converting to string and zfill
ing the zeros:
df = pd.DataFrame({'time': [90000, 0, 123456]})
df['time2'] = pd.to_datetime(df['time'].astype(str).str.zfill(6), format='%H%M%S').dt.time
Or, as string:
df['time2'] = pd.to_datetime(df['time'].astype(str).str.zfill(6), format='%H%M%S').dt.strftime('%H:%M:%S')
Output:
time time2
0 90000 09:00:00
1 0 00:00:00
2 123456 12:34:56
You can convert the values to time (Timedelta) directly when loading through the converters
parameter. This parameter tells read_csv
how to convert specific columns. Unfortunately, there’s no equivalent to strptime
for timedelta.
One quick & dirty way to parse this format into a timedelta is to use a regular expression or just split the string into parts, eg :
def parse_timedelta(time_str):
s=time_str.zfill(6)
return pd.Timedelta(
hours=int(s[0:2]),
minutes=int(s[2:4]),
seconds=int(s[4:6]))
This can be used to convert fields in HHMMSS
format to timedelta:
csv="Timen90000n000000n123456"
df = pd.read_csv(StringIO(csv),converters={'Time':parse_time})
>>> df
Time
0 3 days 18:00:00
1 0 days 00:00:00
2 0 days 12:34:56