Fill NA values over varied data frame column slices in Pandas

Question:

I have a Pandas data frame similar to the following:

pd.DataFrame({
    'End' : ['2022-03','2022-05','2022-06'],
    '2022-01' : [1,2,np.nan],
    '2022-02' : [np.nan,3,4],
    '2022-03' : [np.nan,1,3],
    '2022-04' : [np.nan,np.nan,2],
    '2022-05' : [np.nan,np.nan,np.nan],
    '2022-06' : [np.nan,np.nan,np.nan]
})

I would like to fill the NaN values in each row such that all columns up to that listed in end are replaced with 0 while those after remain as NaN

The desired output would be:

pd.DataFrame({
    'End' : ['2022-03','2022-05','2022-06'],
    '2022-01' : [1,2,0],
    '2022-02' : [0,3,4],
    '2022-03' : [0,1,3],
    '2022-04' : [np.nan,0,2],
    '2022-05' : [np.nan,0,0],
    '2022-06' : [np.nan,np.nan,0]
})
Asked By: r0bt

||

Answers:

Use broadcasting to compare the months, then you can mask with where:

df.iloc[:,1:] = df.iloc[:,1:].fillna(0).where(df['End'].to_numpy()[:,None] >= [df.columns[1:]])

Or safer when your other data is not NaN:

df.iloc[:,1:] = np.where(df['End'].to_numpy()[:,None] >= [df.columns[1:]],
                         df.iloc[:,1:].fillna(0), df.iloc[:,1:])

Output:

       End  2022-01  2022-02  2022-03  2022-04  2022-05  2022-06
0  2022-03      1.0      0.0      0.0      NaN      NaN      NaN
1  2022-05      2.0      3.0      1.0      0.0      0.0      NaN
2  2022-06      0.0      4.0      3.0      2.0      0.0      0.0

Note: It might be better setting End as the index.

Answered By: Quang Hoang

Use numpy broadcasting on the index/columns with mask and fillna:

mask = df['End'].to_numpy()[:, None] >= df.columns.to_numpy()

out = df.fillna(df.mask(mask, 0))

print(out)

Output:

       End  2022-01  2022-02  2022-03  2022-04  2022-05  2022-06
0  2022-03      1.0      0.0      0.0      NaN      NaN      NaN
1  2022-05      2.0      3.0      1.0      0.0      0.0      NaN
2  2022-06      0.0      4.0      3.0      2.0      0.0      0.0

Intermediate mask:

array([[ True,  True, False, False, False, False],
       [ True,  True,  True,  True, False, False],
       [ True,  True,  True,  True,  True, False]])
Answered By: mozway

Probably not the most elegant solution but can be done using pd.melt and pd.pivot:

melt_df = df.melt(id_vars=["End"])
melt_df.loc[(melt_df["End"] >= melt_df["variable"]) & (melt_df["value"].isnull()), "value"] = 0

This makes checking your condition easier. Then you reverse back to get original df format:

final_df = melt_df.pivot(index="End", columns="variable", values="value").reset_index()
final_df.columns.name = None

       End  2022-01  2022-02  2022-03  2022-04  2022-05  2022-06
0  2022-03      1.0      0.0      0.0      NaN      NaN      NaN
1  2022-05      2.0      3.0      1.0      0.0      0.0      NaN
2  2022-06      0.0      4.0      3.0      2.0      0.0      0.0
Answered By: TYZ
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.