Add years to a date column based on condition in pandas
Question:
I have a dataframe with one of the column as date (DATECOL). My target is to increment the years by a "number" but the number is a variable and will be different. This number will be decided on the basis of the values of other columns.
For eg. in SQL this can be implemented as
SELECT CASE
WHEN COL1 = "AB" THEN DATEADD(year, 2, DATECOL)
WHEN COL1 = "XY" AND COL2 = "PQR" THEN DATEADD(year, 2, DATECOL)
WHEN COL1 = "XY" AND COL2 != "PQR" THEN DATEADD(year, 3, DATECOL)
END AS NEWCOL
FROM DATAFRAME
Can someone please help me implement this kind of logic in pandas.
Answers:
You have to first ensure your DATECOL
column is of type datetime. Then you can do this:
import pandas as pd
df = pd.DataFrame({'COL1': ['AB', 'XY', 'XY'], 'COL2': ['PQR', 'bla', 'PQR'], 'DATECOL': ['20101001', '20111001', '20121001']})
df['DATECOL'] = pd.to_datetime(df['DATECOL'], format='%Y%m%d') #change format as per your need
c1 = df['COL1']=='AB'
c2 = (df['COL1']=='XY') & (df['COL2']=='PQR')
c3 = (df['COL1']=='XY') & (df['COL2']!='PQR')
if len(df[c1]) > 0:
df.loc[c1, 'DATECOL'] += pd.offsets.DateOffset(years=2)
elif len(df[c2]) > 0:
df.loc[c2, 'DATECOL'] += pd.offsets.DateOffset(years=2)
elif len(df[c3]) > 0:
df.loc[c3, 'DATECOL'] += pd.offsets.DateOffset(years=3)
What about using np.where with conditions as Akanksha Atrey mentioned?
np.where((first_condition) == True, add_yars, np.where(second_condition) == True, ...)
Consider numpy.select
for multiple logical conditions and corresponding values:
conditions = [(df['COL1'] == 'AB'),
(df['COL1'] == 'XY') & (df['COL2'] == 'PQR'),
(df['COL1'] == 'XY') & (df['COL2'] != 'PQR')]
choices = [df['DATECOL'] + pd.DateOffset(years=2),
df['DATECOL'] + pd.DateOffset(years=2),
df['DATECOL'] + pd.DateOffset(years=3)]
df['NEWCOL'] = np.select(conditions, choices, default=np.datetime64('NaT'))
I have a dataframe with one of the column as date (DATECOL). My target is to increment the years by a "number" but the number is a variable and will be different. This number will be decided on the basis of the values of other columns.
For eg. in SQL this can be implemented as
SELECT CASE
WHEN COL1 = "AB" THEN DATEADD(year, 2, DATECOL)
WHEN COL1 = "XY" AND COL2 = "PQR" THEN DATEADD(year, 2, DATECOL)
WHEN COL1 = "XY" AND COL2 != "PQR" THEN DATEADD(year, 3, DATECOL)
END AS NEWCOL
FROM DATAFRAME
Can someone please help me implement this kind of logic in pandas.
You have to first ensure your DATECOL
column is of type datetime. Then you can do this:
import pandas as pd
df = pd.DataFrame({'COL1': ['AB', 'XY', 'XY'], 'COL2': ['PQR', 'bla', 'PQR'], 'DATECOL': ['20101001', '20111001', '20121001']})
df['DATECOL'] = pd.to_datetime(df['DATECOL'], format='%Y%m%d') #change format as per your need
c1 = df['COL1']=='AB'
c2 = (df['COL1']=='XY') & (df['COL2']=='PQR')
c3 = (df['COL1']=='XY') & (df['COL2']!='PQR')
if len(df[c1]) > 0:
df.loc[c1, 'DATECOL'] += pd.offsets.DateOffset(years=2)
elif len(df[c2]) > 0:
df.loc[c2, 'DATECOL'] += pd.offsets.DateOffset(years=2)
elif len(df[c3]) > 0:
df.loc[c3, 'DATECOL'] += pd.offsets.DateOffset(years=3)
What about using np.where with conditions as Akanksha Atrey mentioned?
np.where((first_condition) == True, add_yars, np.where(second_condition) == True, ...)
Consider numpy.select
for multiple logical conditions and corresponding values:
conditions = [(df['COL1'] == 'AB'),
(df['COL1'] == 'XY') & (df['COL2'] == 'PQR'),
(df['COL1'] == 'XY') & (df['COL2'] != 'PQR')]
choices = [df['DATECOL'] + pd.DateOffset(years=2),
df['DATECOL'] + pd.DateOffset(years=2),
df['DATECOL'] + pd.DateOffset(years=3)]
df['NEWCOL'] = np.select(conditions, choices, default=np.datetime64('NaT'))