Move the NaNs from the end to the beginning of each column while maintaining the order of the numbers in pandas dataframe

Question:

I don’t know how to move the nan present in each column to the beginning of each of it, translating the different values from nan to the bottom of the column.

    import pandas as pd
    import numpy as np 

    q = pd.Series(dtype=int)
    w = pd.Series(data=np.array([6,23,4]))
    e = pd.Series(data=np.array([2,3,7,89,12,45]))
    r = pd.Series(data=np.array([98,25,84,12]))
    t = pd.Series(data=np.array([36,41,74,29,86,77,55]))
    
    q = pd.concat([q, w], axis=1)
    q = pd.concat([q, e], axis=1)
    q = pd.concat([q, r], axis=1)
    q = pd.concat([q, t], axis=1)
    
    print(q)
Asked By: Filippo Di Santi

||

Answers:

Would this make sense to you?

columns = [
    [6, 23, 4],
    [2, 3, 7, 89, 12, 45],
    [98, 25, 84, 12],
    [36, 41, 74, 29, 86, 77, 55]
]

max_length = max(len(column) for column in columns)
data = [
    [None for _ in range(max_length - len(column))] + column
    for column in columns
]
df = pd.DataFrame(data, dtype=int).T
print(df)

Result:

      w     e     r   t
0   NaN   NaN   NaN  36
1   NaN   2.0   NaN  41
2   NaN   3.0   NaN  74
3   NaN   7.0  98.0  29
4   6.0  89.0  25.0  86
5  23.0  12.0  84.0  77
6   4.0  45.0  12.0  55
Answered By: Michal Racko

Use DataFrame.sort_values with key parameter:

df = pd.concat([q, w,e,r,t], axis=1).apply(lambda x: x.sort_values(key=pd.notna).to_numpy())

print(df)
    0     1     2     3   4
0 NaN   NaN   NaN   NaN  36
1 NaN   NaN   2.0   NaN  41
2 NaN   NaN   3.0   NaN  74
3 NaN   NaN   7.0  98.0  29
4 NaN   6.0  89.0  25.0  86
5 NaN  23.0  12.0  84.0  77
6 NaN   4.0  45.0  12.0  55

Another idea is swap order of arrays and after concat use DataFrame.sort_index:

q = pd.Series(dtype=int)
w = pd.Series(data=np.array([6,23,4])[::-1])
e = pd.Series(data=np.array([2,3,7,89,12,45])[::-1])
r = pd.Series(data=np.array([98,25,84,12])[::-1])
t = pd.Series(data=np.array([36,41,74,29,86,77,55])[::-1])


df = pd.concat([q, w,e,r,t], axis=1).sort_index(ascending=False, ignore_index=True)
print(df)
    0     1     2     3   4
0 NaN   NaN   NaN   NaN  36
1 NaN   NaN   2.0   NaN  41
2 NaN   NaN   3.0   NaN  74
3 NaN   NaN   7.0  98.0  29
4 NaN   6.0  89.0  25.0  86
5 NaN  23.0  12.0  84.0  77
6 NaN   4.0  45.0  12.0  55
Answered By: jezrael

My solution (maybe not the best):

I would use shift function and apply it for each column.

But first we need to name cols properly as later indexing and shifting of values will be done by column names.

col_names = ["A","B","C","D","E"]

q.columns =col_names # renaming cols by list

cols_nan = q.isna().sum() # counting NaN values per each col and placing into pd.series / dataframe

df = q.apply(lambda x: x.shift(periods = cols_nan[x.name])) #using lambda to shift each columns by values in the cols_nan

out:

   A     B     C     D   E
0 NaN   NaN   NaN   NaN  36
1 NaN   NaN   2.0   NaN  41
2 NaN   NaN   3.0   NaN  74
3 NaN   NaN   7.0  98.0  29
4 NaN   6.0  89.0  25.0  86
5 NaN  23.0  12.0  84.0  77
6 NaN   4.0  45.0  12.0  55
Answered By: NoobVB
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.