Convert Integer into Datetime in Python
Question:
I’m working on a df that gives me Date in a integer form, and i want to convert it into Datetime so i can continue to manipulate it.
Basically i have this DF where i have a Column like this below
DATE
18010101
18010101
18010101
18010101
18010101
... ... ... ... ... ...
20123124
20123124
20123124
20123124
20123124
the respective order in this dateset is (Year, month, day, hour)
i’ve already tried to do something like this
df["Year"] = df.DATE[0:2]
df["Month"] = df.DATE[2:4]~
but it converts it into a float and for the first line example, it becomes similar to this
DATE Year
18010101 18010101.0
where it was supposed to be
Year = 2018 or 18
I would appreciate a lot the help of you guys.
Answers:
Use Series.str.extract
and a regex
df = pd.DataFrame([[18010101], [18010101], [18010101], [18010101], [18010101],
[20123124], [20123124], [20123124], [20123124], [20123124]],
columns=['DATE'])
df[['year', 'month', 'day', 'hour']] = df['DATE'].astype(str).str.extract(r'(d{2})(d{2})(d{2})(d{2})')
DATE
year
month
day
hour
18010101
18
01
01
01
18010101
18
01
01
01
18010101
18
01
01
01
18010101
18
01
01
01
18010101
18
01
01
01
20123124
20
12
31
24
20123124
20
12
31
24
20123124
20
12
31
24
20123124
20
12
31
24
20123124
20
12
31
24
Earlier, you tried to split an int like a string.
But you can first convert it into a string (by .astype(str)
) and then split it. Check the lines below.
date_str = df.DATE.astype(str)
df["Year"] = date_str.str[0:2]
df["Month"] = date_str.str[2:4]
Are your hour stamps actually from 1 to 24? Normal convention for 24-hour time is 0-23 where you can use:
df = pd.DataFrame([[18010100], [18010100], [18010100], [18010100], [18010100],
[20123123], [20123123], [20123123], [20123123], [20123123]],
columns=['DATE'])
pd.to_datetime(df['DATE'], format='%y%m%d%H')
Or simply take 1 from each number beforehand like so:
df = pd.DataFrame([[18010101], [18010101], [18010101], [18010101], [18010101],
[20123124], [20123124], [20123124], [20123124], [20123124]],
columns=['DATE'])
pd.to_datetime(df['DATE']-1, format='%y%m%d%H')```
I’m working on a df that gives me Date in a integer form, and i want to convert it into Datetime so i can continue to manipulate it.
Basically i have this DF where i have a Column like this below
DATE
18010101
18010101
18010101
18010101
18010101
... ... ... ... ... ...
20123124
20123124
20123124
20123124
20123124
the respective order in this dateset is (Year, month, day, hour)
i’ve already tried to do something like this
df["Year"] = df.DATE[0:2]
df["Month"] = df.DATE[2:4]~
but it converts it into a float and for the first line example, it becomes similar to this
DATE Year
18010101 18010101.0
where it was supposed to be
Year = 2018 or 18
I would appreciate a lot the help of you guys.
Use Series.str.extract
and a regex
df = pd.DataFrame([[18010101], [18010101], [18010101], [18010101], [18010101],
[20123124], [20123124], [20123124], [20123124], [20123124]],
columns=['DATE'])
df[['year', 'month', 'day', 'hour']] = df['DATE'].astype(str).str.extract(r'(d{2})(d{2})(d{2})(d{2})')
DATE | year | month | day | hour |
---|---|---|---|---|
18010101 | 18 | 01 | 01 | 01 |
18010101 | 18 | 01 | 01 | 01 |
18010101 | 18 | 01 | 01 | 01 |
18010101 | 18 | 01 | 01 | 01 |
18010101 | 18 | 01 | 01 | 01 |
20123124 | 20 | 12 | 31 | 24 |
20123124 | 20 | 12 | 31 | 24 |
20123124 | 20 | 12 | 31 | 24 |
20123124 | 20 | 12 | 31 | 24 |
20123124 | 20 | 12 | 31 | 24 |
Earlier, you tried to split an int like a string.
But you can first convert it into a string (by .astype(str)
) and then split it. Check the lines below.
date_str = df.DATE.astype(str)
df["Year"] = date_str.str[0:2]
df["Month"] = date_str.str[2:4]
Are your hour stamps actually from 1 to 24? Normal convention for 24-hour time is 0-23 where you can use:
df = pd.DataFrame([[18010100], [18010100], [18010100], [18010100], [18010100],
[20123123], [20123123], [20123123], [20123123], [20123123]],
columns=['DATE'])
pd.to_datetime(df['DATE'], format='%y%m%d%H')
Or simply take 1 from each number beforehand like so:
df = pd.DataFrame([[18010101], [18010101], [18010101], [18010101], [18010101],
[20123124], [20123124], [20123124], [20123124], [20123124]],
columns=['DATE'])
pd.to_datetime(df['DATE']-1, format='%y%m%d%H')```