How to use the value of one column as part of a string to fill NaNs in another column?

Question:

Let’s say I have the following df:

   year  date_until
1  2010  -
2  2011  30.06.13
3  2011  NaN
4  2015  30.06.18
5  2020  -

I’d like to fill all - and NaNs in the date_until column with 30/06/{year +1}. I tried the following but it uses the whole year column instead of the corresponding value of the specific row:

df['date_until] = df['date_until].str.replace('-', f'30/06/{df["year"]+1}')

my final goal is to calculate the difference between the year and the year of date_until, so maybe the step above is even unnecessary.

Asked By: TiTo

||

Answers:

We can use pd.to_datetime here with errors='coerce' to ignore the faulty dates. Then use the dt.year to calculate the difference:

df['date_until'] = pd.to_datetime(df['date_until'], format='%d.%m.%y', errors='coerce')
df['diff_year'] = df['date_until'].dt.year - df['year']
   year date_until  diff_year
0  2010        NaT        NaN
1  2011 2013-06-30        2.0
2  2011        NaT        NaN
3  2015 2018-06-30        3.0
4  2020        NaT        NaN
Answered By: Erfan

For everybody who is trying to replace values just like I wanted to in the first place, here is how you could solve it:

for i in range(len(df)):
    if pd.isna(df['date_until'].iloc[i]):
        df['date_until'].iloc[i] = f'30.06.{df["year"].iloc[i] +1}'
    if df['date_until'].iloc[i] == '-':
        df['date_until'].iloc[i] = f'30.06.{df["year"].iloc[i] +1}

But @Erfan’s approach is much cleaner

Answered By: TiTo