How to shift a dataframe element-wise to fill NaNs?

Question:

I have a DataFrame like this:

>>> df = pd.DataFrame({'a': list('ABCD'), 'b': ['E',np.nan,np.nan,'F']})
   a    b
0  A    E
1  B  NaN
2  C  NaN
3  D    F

I am trying to fill NaN with values of the previous column in the next row and dropping this second row. In other words, I want to combine the two rows with NaNs to form a single row without NaNs like this:

   a  b
0  A  E
1  B  C
2  D  F

I have tried various flavors of df.fillna(method="<bfill/ffill>") but this didn’t give me the expected output.

I haven’t found any other question about this problem, Here’s one. And actually that DataFrame is made from list of DataFrame by doing .concat(), you may notice that from indexes also. I am telling this because it may be easy to do in single row rather then in multiple rows.

I have found some suggestions to use shift, combine_first but non of them worked for me. You may try these too.

I also have found this too. It is a whole article about filling nan values but I haven’t found problem/answer like mine.

Asked By: Xitiz

||

Answers:

OK misunderstood what you wanted to do the first time. The dummy example was a bit ambiguous.

Here is another:

>>> df = pd.DataFrame({'a': list('ABCD'), 'b': ['E',np.nan,np.nan,'F']})
   a    b
0  A    E
1  B  NaN
2  C  NaN
3  D    F

To my knowledge, this operation does not exist with pandas, so we will use numpy to do the work.

First transform the dataframe to numpy array and flatten it to be one-dimensional. Then drop NaNs using pandas.isna that is working on a larger range types than numpy.isnan, and then reshape the array to its original shape before transforming back to dataframe:

array = df.to_numpy().flatten()
pd.DataFrame(array[~pd.isna(array)].reshape(-1,df.shape[1]), columns=df.columns)

output:

   a  b
0  A  E
1  B  C
2  D  F

It is also working for more complex examples, as long as the NaN pattern is conserved among columns with NaNs:

In:
   a    b   c    d
0  A    H  A2   H2
1  B  NaN  B2  NaN
2  C  NaN  C2  NaN
3  D    I  D2   I2
4  E  NaN  E2  NaN
5  F  NaN  F2  NaN
6  G    J  G2   J2

Out:
   a   b   c   d
0  A   H  A2  H2
1  B  B2   C  C2
2  D   I  D2  I2
3  E  E2   F  F2
4  G   J  G2  J2
In:
   a    b    c
0  A    F    H
1  B  NaN  NaN
2  C  NaN  NaN
3  D  NaN  NaN
4  E    G    I

Out:
   a  b  c
0  A  F  H
1  B  C  D
2  E  G  I

In case NaNs columns do not have the same pattern such as:

   a    b   c    d
0  A    H  A2  NaN
1  B  NaN  B2  NaN
2  C  NaN  C2   H2
3  D    I  D2   I2
4  E  NaN  E2  NaN
5  F  NaN  F2  NaN
6  G    J  G2   J2

You can apply the operation per group of two columns:

def elementwise_shift(df):
    array = df.to_numpy().flatten()
    return pd.DataFrame(array[~pd.isna(array)].reshape(-1,df.shape[1]), columns=df.columns)

(df.groupby(np.repeat(np.arange(df.shape[1]/2), 2), axis=1)
   .apply(elementwise_shift)
)

output:

   a  b   c   d
0  A  H  A2  B2
1  B  C  C2  H2
2  D  I  D2  I2
3  E  F  E2  F2
4  G  J  G2  J2
Answered By: mozway

You can do this in two steps with a placeholder column. First you fill all the nans in column b with the a values from the next row. Then you apply the filtering. In this example I use ffill with a limit of 1 to filter all nan values after the first, there’s probably a better method.

import pandas as pd
import numpy as np
df=pd.DataFrame({"a":[1,2,3,3,4],"b":[1,2,np.nan,np.nan,4]})

# Fill all nans:
df['new_b'] = df['b'].fillna(df['a'].shift(-1))
df = df[df['b'].ffill(limit=1).notna()].copy() # .copy() because loc makes a view
df = df.drop('b', axis=1).rename(columns={'new_b': 'b'})

print(df)
# output:
#    a  b
# 0  1  1
# 1  2  2
# 2  3  2
# 4  4  4
Answered By: Swier