Add a date column in pandas df using constant value in str
Question:
i have a table in pandas df
product_id_x product_id_y count
0 2727846 7872456 1
1 29234 2932348 2
2 29346 9137500 1
3 29453 91365738 1
4 2933666 91323494 1
i want to add a new column ‘dates’ which i have defined in a str.
dateSelect = "'2016-11-06'"
so i added a new constant column
df['dates'] = dateSelect
but i am getting result as
product_id_x product_id_y count dates
0 2727846 7872456 1 '2016-11-06'
1 29234 2932348 2 '2016-11-06'
2 29346 9137500 1 '2016-11-06'
3 29453 91365738 1 '2016-11-06'
4 2933666 91323494 1 '2016-11-06'
the values in the dates are coming in quotes. and
type(df['dates']) = str
but i want it in date format, because further i am going to store this table in my mysql database. and i want the type to be date.
from sqlalchemy import create_engine
engine = create_engine('mysql+mysqldb://name:[email protected]/dbname', echo=False)
df.to_sql(name='tablename', con=engine, if_exists = 'append', index=False)
Answers:
I think you can use first replace
'
by empty space and then to_datetime
:
dateSelect = pd.to_datetime("'2016-11-06'".replace("'",""))
print (dateSelect)
2016-11-06 00:00:00
print (type(dateSelect))
<class 'pandas.tslib.Timestamp'>
df['dates'] = pd.to_datetime("'2016-11-06'".replace("'",""))
print (df)
product_id_x product_id_y count dates
0 2727846 7872456 1 2016-11-06
1 29234 2932348 2 2016-11-06
2 29346 9137500 1 2016-11-06
3 29453 91365738 1 2016-11-06
4 2933666 91323494 1 2016-11-06
print (df.dtypes)
product_id_x int64
product_id_y int64
count int64
dates datetime64[ns]
dtype: object
In it don’t put double quote avoiding to define it as string.
dateSelect = '2016-11-06'
df['dates'] = dateSelect
Ahh! @jezrael got there first…
print timeit.timeit("""
import pandas as pd
import datetime as dt
import timeit
df = pd.read_csv('date_time_pandas.csv')
dateSelect_str = "2016-11-06"
# using standard datetime
dateSelect = dt.datetime.strptime(dateSelect_str,"%Y-%m-%d")
df['dates'] = dateSelect
#print(df['dates'])
""",number=100)
# Alternate method using pandas datetime
print timeit.timeit("""
import pandas as pd
import datetime as dt
import timeit
df = pd.read_csv('date_time_pandas.csv')
dateSelect_str = "2016-11-06"
dateSelect = pd.to_datetime(dateSelect_str, format='%Y-%m-%d', errors='ignore')
df['dates'] = dateSelect
#print df['dates']
""",number=100)
gives output –
0.228258825751
0.167258402887
on an average.
Conclusion Using pd_datetime in this case is more efficient
most direct route
df['dates'] = pd.Timestamp('2016-11-06')
df
product_id_x product_id_y count dates
0 2727846 7872456 1 2016-11-06
1 29234 2932348 2 2016-11-06
2 29346 9137500 1 2016-11-06
3 29453 91365738 1 2016-11-06
4 2933666 91323494 1 2016-11-06
For date range pd.date_range function is the best one..
dataFrame["Date Column"] = pd.date_range("1/08/2020", periods=len(dataFrame))
All the best!
i have a table in pandas df
product_id_x product_id_y count
0 2727846 7872456 1
1 29234 2932348 2
2 29346 9137500 1
3 29453 91365738 1
4 2933666 91323494 1
i want to add a new column ‘dates’ which i have defined in a str.
dateSelect = "'2016-11-06'"
so i added a new constant column
df['dates'] = dateSelect
but i am getting result as
product_id_x product_id_y count dates
0 2727846 7872456 1 '2016-11-06'
1 29234 2932348 2 '2016-11-06'
2 29346 9137500 1 '2016-11-06'
3 29453 91365738 1 '2016-11-06'
4 2933666 91323494 1 '2016-11-06'
the values in the dates are coming in quotes. and
type(df['dates']) = str
but i want it in date format, because further i am going to store this table in my mysql database. and i want the type to be date.
from sqlalchemy import create_engine
engine = create_engine('mysql+mysqldb://name:[email protected]/dbname', echo=False)
df.to_sql(name='tablename', con=engine, if_exists = 'append', index=False)
I think you can use first replace
'
by empty space and then to_datetime
:
dateSelect = pd.to_datetime("'2016-11-06'".replace("'",""))
print (dateSelect)
2016-11-06 00:00:00
print (type(dateSelect))
<class 'pandas.tslib.Timestamp'>
df['dates'] = pd.to_datetime("'2016-11-06'".replace("'",""))
print (df)
product_id_x product_id_y count dates
0 2727846 7872456 1 2016-11-06
1 29234 2932348 2 2016-11-06
2 29346 9137500 1 2016-11-06
3 29453 91365738 1 2016-11-06
4 2933666 91323494 1 2016-11-06
print (df.dtypes)
product_id_x int64
product_id_y int64
count int64
dates datetime64[ns]
dtype: object
In it don’t put double quote avoiding to define it as string.
dateSelect = '2016-11-06'
df['dates'] = dateSelect
Ahh! @jezrael got there first…
print timeit.timeit("""
import pandas as pd
import datetime as dt
import timeit
df = pd.read_csv('date_time_pandas.csv')
dateSelect_str = "2016-11-06"
# using standard datetime
dateSelect = dt.datetime.strptime(dateSelect_str,"%Y-%m-%d")
df['dates'] = dateSelect
#print(df['dates'])
""",number=100)
# Alternate method using pandas datetime
print timeit.timeit("""
import pandas as pd
import datetime as dt
import timeit
df = pd.read_csv('date_time_pandas.csv')
dateSelect_str = "2016-11-06"
dateSelect = pd.to_datetime(dateSelect_str, format='%Y-%m-%d', errors='ignore')
df['dates'] = dateSelect
#print df['dates']
""",number=100)
gives output –
0.228258825751
0.167258402887
on an average.
Conclusion Using pd_datetime in this case is more efficient
most direct route
df['dates'] = pd.Timestamp('2016-11-06')
df
product_id_x product_id_y count dates
0 2727846 7872456 1 2016-11-06
1 29234 2932348 2 2016-11-06
2 29346 9137500 1 2016-11-06
3 29453 91365738 1 2016-11-06
4 2933666 91323494 1 2016-11-06
For date range pd.date_range function is the best one..
dataFrame["Date Column"] = pd.date_range("1/08/2020", periods=len(dataFrame))
All the best!