In my data frame I have Time column, I need to convert my HH:MM:SS.SS to seconds. How can I do that in python?
Question:
Time volts
0 15:15:10.951 368
1 15:15:11.950 373
2 15:15:12.950 368
3 15:15:13.949 316
4 15:15:14.949 368
... ... ...
2141 15:50:54.087 337
2142 15:50:55.069 343
2143 15:50:56.085 344
2144 15:50:57.081 339
2145 15:50:58.090 347
def time_convert(x):
h,m,s = map(int,x.split(':'))
return int(h) * 3600 + int(m) * 60 + int(s)
The output I get:
ValueError Traceback (most recent call last)
<ipython-input-17-68cf4416cc88> in <module>
----> 1 df['Time'] = df['Time'].apply(time_convert)
4 frames
<ipython-input-12-42bee45f8bd8> in time_convert(x)
1 def time_convert(x):
----> 2 h,m,s = map(int,x.split(':'))
3 return int(h) * 3600 + int(m) * 60 + int(s)
4
5
ValueError: invalid literal for int() with base 10: '10.951'
I was expecting it to be converted to seconds. I only find HH:MM:SS format to seconds for solutions but I have not found any cases regarding SS.SS conversion.
Answers:
I can only surmise what is in your dataframe. And anyway, iterating (even with apply) rows, is generally speaking a bad idea (very slow).
But, as for why it doesn’t work, it lies in your conversion function
def time_convert(x):
h,m,s = map(int,x.split(':'))
return int(h) * 3600 + int(m) * 60 + int(s)
you are converting to int
twice here!
Once when mapping int to x.split(':')
.
And then, when converting each of h,m,s
So, simply
def time_convert(x):
h,m,s = x.split(':')
return int(h) * 3600 + int(m) * 60 + int(s)
does the same. And still doesn’t work. Because you cannot convert s to int, since it is not one
def time_convert(x):
h,m,s = x.split(':')
return int(h) * 3600 + int(m) * 60 + float(s)
As is, your code works. There must be more efficient way, but it works.
More efficient way
df.Time.str[:2].astype(int)
is a series of int conversion of the 2 first chars of df for example.
df.Time.str[3:5].astype(int)
likewise for 4th and 5th chars.
Likewise df.Time.str[6:].astype(float)
And you can do arithmetic on whole series. So
3600*df.Time.str[:2].astype(int) + 60*df.Time.str[3:5].astype(int) + df.Time.str[6:].astype(float)
is the series of values you wanted.
Hence, a fastest version of what you wanted
df['Time'] = 3600*df.Time.str[:2].astype(int) + 60*df.Time.str[3:5].astype(int) + df.Time.str[6:].astype(float)
Time volts
0 15:15:10.951 368
1 15:15:11.950 373
2 15:15:12.950 368
3 15:15:13.949 316
4 15:15:14.949 368
... ... ...
2141 15:50:54.087 337
2142 15:50:55.069 343
2143 15:50:56.085 344
2144 15:50:57.081 339
2145 15:50:58.090 347
def time_convert(x):
h,m,s = map(int,x.split(':'))
return int(h) * 3600 + int(m) * 60 + int(s)
The output I get:
ValueError Traceback (most recent call last)
<ipython-input-17-68cf4416cc88> in <module>
----> 1 df['Time'] = df['Time'].apply(time_convert)
4 frames
<ipython-input-12-42bee45f8bd8> in time_convert(x)
1 def time_convert(x):
----> 2 h,m,s = map(int,x.split(':'))
3 return int(h) * 3600 + int(m) * 60 + int(s)
4
5
ValueError: invalid literal for int() with base 10: '10.951'
I was expecting it to be converted to seconds. I only find HH:MM:SS format to seconds for solutions but I have not found any cases regarding SS.SS conversion.
I can only surmise what is in your dataframe. And anyway, iterating (even with apply) rows, is generally speaking a bad idea (very slow).
But, as for why it doesn’t work, it lies in your conversion function
def time_convert(x):
h,m,s = map(int,x.split(':'))
return int(h) * 3600 + int(m) * 60 + int(s)
you are converting to int
twice here!
Once when mapping int to x.split(':')
.
And then, when converting each of h,m,s
So, simply
def time_convert(x):
h,m,s = x.split(':')
return int(h) * 3600 + int(m) * 60 + int(s)
does the same. And still doesn’t work. Because you cannot convert s to int, since it is not one
def time_convert(x):
h,m,s = x.split(':')
return int(h) * 3600 + int(m) * 60 + float(s)
As is, your code works. There must be more efficient way, but it works.
More efficient way
df.Time.str[:2].astype(int)
is a series of int conversion of the 2 first chars of df for example.
df.Time.str[3:5].astype(int)
likewise for 4th and 5th chars.
Likewise df.Time.str[6:].astype(float)
And you can do arithmetic on whole series. So
3600*df.Time.str[:2].astype(int) + 60*df.Time.str[3:5].astype(int) + df.Time.str[6:].astype(float)
is the series of values you wanted.
Hence, a fastest version of what you wanted
df['Time'] = 3600*df.Time.str[:2].astype(int) + 60*df.Time.str[3:5].astype(int) + df.Time.str[6:].astype(float)