Combine year, month and day in Python to create a date
Question:
I have a dataframe that consists of separate columns for year, month and day. I tried to combine these individual columns into one date using:
df['myDt']=pd.to_datetime(df[['year','month','day']])
only to get the following error: “to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing”. Not sure what this means….I’m already supplying the relevant columns.
On checking the datatypes, I found that they Year, Month and Day columns are int64. Would that be causing an issue?
Thanks,
Chet
Thank you all for posting. As suggested, I’m posting the sample data set first:
Value mm yy dd
Date
2018-11-30 88.550067 11 2018 1
2018-12-31 88.906290 12 2018 1
2019-01-31 88.723000 1 2019 1
2019-02-28 89.509179 2 2019 1
2019-03-31 90.049161 3 2019 1
2019-04-30 90.523100 4 2019 1
2019-05-31 90.102484 5 2019 1
2019-06-30 91.179400 6 2019 1
2019-07-31 90.963570 7 2019 1
2019-08-31 92.159170 8 2019 1
The data source is:https://www.quandl.com/data/EIA/STEO_NGPRPUS_M
I imported the data as follows:
1. import quandl (used conda install first)
2. Used Quandl’s Python code:
data=quandl.get(“EIA/STEO_NGPRPUS_M”, authtoken=”TOKEN”,”2005-01-01″,”2005-12-31″)
4. Just to note, the original data comes only with the Value column, and DateTime as index. I extracted and created the mm,yy and dd columns (month, year, and dd is a column vector set to 1)
All I’m trying to do is create another column called “first of the month” – so for each day of each month, the column will just show “MM/YY/1”. I’m going to try out all the suggestions below shortly and get back to you guys. Thanks!!
Answers:
Solution
You could use datetime.datetime
along with .apply()
.
import datetime
d = datetime.datetime(2020, 5, 17)
date = d.date()
For pandas.to_datetime(df)
It looks like your code is fine. See pandas.to_datetime
documentation and How to convert columns into one datetime column in pandas?.
df = pd.DataFrame({'year': [2015, 2016],
'month': [2, 3],
'day': [4, 5]})
pd.to_datetime(df[["year", "month", "day"]])
Output:
0 2015-02-04
1 2016-03-05
dtype: datetime64[ns]
What if your YEAR, MONTH and DAY columns have different headers?
Let’s say your YEAR, MONTH and DAY columns are labeled as yy
, mm
and dd
respectively. And you prefer to keep your column names unchanged. In that case you could do it as follows.
import pandas as pd
df = pd.DataFrame({'yy': [2015, 2016],
'mm': [2, 3],
'dd': [4, 5]})
df2 = df[["yy", "mm", "dd"]].copy()
df2.columns = ["year", "month", "day"]
pd.to_datetime(df2)
Output:
0 2015-02-04
1 2016-03-05
dtype: datetime64[ns]
You should use the apply
method as follows:
from datetime import datetime
df['myDt'] = df.apply(lambda row: datetime.strptime(f"{int(row.year)}-{int(row.month)}-{int(row.day)}", '%Y-%m-%d'), axis=1)
Running Example:
>>> d = {'year': list(range(2015, 2020)), 'month': list(range(5, 10)), 'day': >> list(range(20, 25))}
>> df = pd.DataFrame(d)
>> df
year month day myDt
0 2015 5 20 2015-05-20
1 2016 6 21 2016-06-21
2 2017 7 22 2017-07-22
3 2018 8 23 2018-08-23
4 2019 9 24 2019-09-24
Here is a two liner:
df['dateInt']=df['year'].astype(str) + df['month'].astype(str).str.zfill(2)+ df['day'].astype(str).str.zfill(2)
df['Date'] = pd.to_datetime(df['dateInt'], format='%Y%m%d')
Output
year month day dateInt Date
0 2015 5 20 20150520 2015-05-20
1 2016 6 21 20160621 2016-06-21
2 2017 7 22 20170722 2017-07-22
3 2018 8 23 20180823 2018-08-23
4 2019 9 24 20190924 2019-09-24
#Add and calculate a new Calculated_Date column
df['Calculated_Date'] = df[['year', 'month', 'day']].apply(lambda x: '{}-{}-{}'.format(x[0], x[1], x[2]), axis=1)
df['Calculated_Date'].head()
#Parse your Calculated_Date column into a datetime obj (not needed; but if you need to parse)
df['Calculated_Date'] = pd.to_datetime(df['Calculated_Date'])
df['Calculated_Date'].head()
Improving the answer from @lmiguelvargasf, sometimes you want to save as datetime
format. Furthermore, using apply
(IMHO) is better if other column is exist with some value (something like sales for the example).
import datetime
df['dt'] = df.apply(lambda row: datetime.datetime(int(row.yy),
int(row.mm),
int(row.dd)), axis=1)
df.head()
Note: my example only working if the yy
value is in 2022
for example. If your yy
value is 21
, you need to modify such as 2000 + int(row.yy)
.
I have a dataframe that consists of separate columns for year, month and day. I tried to combine these individual columns into one date using:
df['myDt']=pd.to_datetime(df[['year','month','day']])
only to get the following error: “to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing”. Not sure what this means….I’m already supplying the relevant columns.
On checking the datatypes, I found that they Year, Month and Day columns are int64. Would that be causing an issue?
Thanks,
Chet
Thank you all for posting. As suggested, I’m posting the sample data set first:
Value mm yy dd
Date
2018-11-30 88.550067 11 2018 1
2018-12-31 88.906290 12 2018 1
2019-01-31 88.723000 1 2019 1
2019-02-28 89.509179 2 2019 1
2019-03-31 90.049161 3 2019 1
2019-04-30 90.523100 4 2019 1
2019-05-31 90.102484 5 2019 1
2019-06-30 91.179400 6 2019 1
2019-07-31 90.963570 7 2019 1
2019-08-31 92.159170 8 2019 1
The data source is:https://www.quandl.com/data/EIA/STEO_NGPRPUS_M
I imported the data as follows:
1. import quandl (used conda install first)
2. Used Quandl’s Python code:
data=quandl.get(“EIA/STEO_NGPRPUS_M”, authtoken=”TOKEN”,”2005-01-01″,”2005-12-31″)
4. Just to note, the original data comes only with the Value column, and DateTime as index. I extracted and created the mm,yy and dd columns (month, year, and dd is a column vector set to 1)
All I’m trying to do is create another column called “first of the month” – so for each day of each month, the column will just show “MM/YY/1”. I’m going to try out all the suggestions below shortly and get back to you guys. Thanks!!
Solution
You could use datetime.datetime
along with .apply()
.
import datetime
d = datetime.datetime(2020, 5, 17)
date = d.date()
For pandas.to_datetime(df)
It looks like your code is fine. See pandas.to_datetime
documentation and How to convert columns into one datetime column in pandas?.
df = pd.DataFrame({'year': [2015, 2016],
'month': [2, 3],
'day': [4, 5]})
pd.to_datetime(df[["year", "month", "day"]])
Output:
0 2015-02-04
1 2016-03-05
dtype: datetime64[ns]
What if your YEAR, MONTH and DAY columns have different headers?
Let’s say your YEAR, MONTH and DAY columns are labeled as yy
, mm
and dd
respectively. And you prefer to keep your column names unchanged. In that case you could do it as follows.
import pandas as pd
df = pd.DataFrame({'yy': [2015, 2016],
'mm': [2, 3],
'dd': [4, 5]})
df2 = df[["yy", "mm", "dd"]].copy()
df2.columns = ["year", "month", "day"]
pd.to_datetime(df2)
Output:
0 2015-02-04
1 2016-03-05
dtype: datetime64[ns]
You should use the apply
method as follows:
from datetime import datetime
df['myDt'] = df.apply(lambda row: datetime.strptime(f"{int(row.year)}-{int(row.month)}-{int(row.day)}", '%Y-%m-%d'), axis=1)
Running Example:
>>> d = {'year': list(range(2015, 2020)), 'month': list(range(5, 10)), 'day': >> list(range(20, 25))}
>> df = pd.DataFrame(d)
>> df
year month day myDt
0 2015 5 20 2015-05-20
1 2016 6 21 2016-06-21
2 2017 7 22 2017-07-22
3 2018 8 23 2018-08-23
4 2019 9 24 2019-09-24
Here is a two liner:
df['dateInt']=df['year'].astype(str) + df['month'].astype(str).str.zfill(2)+ df['day'].astype(str).str.zfill(2)
df['Date'] = pd.to_datetime(df['dateInt'], format='%Y%m%d')
Output
year month day dateInt Date
0 2015 5 20 20150520 2015-05-20
1 2016 6 21 20160621 2016-06-21
2 2017 7 22 20170722 2017-07-22
3 2018 8 23 20180823 2018-08-23
4 2019 9 24 20190924 2019-09-24
#Add and calculate a new Calculated_Date column
df['Calculated_Date'] = df[['year', 'month', 'day']].apply(lambda x: '{}-{}-{}'.format(x[0], x[1], x[2]), axis=1)
df['Calculated_Date'].head()
#Parse your Calculated_Date column into a datetime obj (not needed; but if you need to parse)
df['Calculated_Date'] = pd.to_datetime(df['Calculated_Date'])
df['Calculated_Date'].head()
Improving the answer from @lmiguelvargasf, sometimes you want to save as datetime
format. Furthermore, using apply
(IMHO) is better if other column is exist with some value (something like sales for the example).
import datetime
df['dt'] = df.apply(lambda row: datetime.datetime(int(row.yy),
int(row.mm),
int(row.dd)), axis=1)
df.head()
Note: my example only working if the yy
value is in 2022
for example. If your yy
value is 21
, you need to modify such as 2000 + int(row.yy)
.