Move the NaNs from the end to the beginning of each column while maintaining the order of the numbers in pandas dataframe
Question:
I don’t know how to move the nan present in each column to the beginning of each of it, translating the different values from nan to the bottom of the column.
import pandas as pd
import numpy as np
q = pd.Series(dtype=int)
w = pd.Series(data=np.array([6,23,4]))
e = pd.Series(data=np.array([2,3,7,89,12,45]))
r = pd.Series(data=np.array([98,25,84,12]))
t = pd.Series(data=np.array([36,41,74,29,86,77,55]))
q = pd.concat([q, w], axis=1)
q = pd.concat([q, e], axis=1)
q = pd.concat([q, r], axis=1)
q = pd.concat([q, t], axis=1)
print(q)
Answers:
Would this make sense to you?
columns = [
[6, 23, 4],
[2, 3, 7, 89, 12, 45],
[98, 25, 84, 12],
[36, 41, 74, 29, 86, 77, 55]
]
max_length = max(len(column) for column in columns)
data = [
[None for _ in range(max_length - len(column))] + column
for column in columns
]
df = pd.DataFrame(data, dtype=int).T
print(df)
Result:
w e r t
0 NaN NaN NaN 36
1 NaN 2.0 NaN 41
2 NaN 3.0 NaN 74
3 NaN 7.0 98.0 29
4 6.0 89.0 25.0 86
5 23.0 12.0 84.0 77
6 4.0 45.0 12.0 55
Use DataFrame.sort_values
with key
parameter:
df = pd.concat([q, w,e,r,t], axis=1).apply(lambda x: x.sort_values(key=pd.notna).to_numpy())
print(df)
0 1 2 3 4
0 NaN NaN NaN NaN 36
1 NaN NaN 2.0 NaN 41
2 NaN NaN 3.0 NaN 74
3 NaN NaN 7.0 98.0 29
4 NaN 6.0 89.0 25.0 86
5 NaN 23.0 12.0 84.0 77
6 NaN 4.0 45.0 12.0 55
Another idea is swap order of arrays and after concat
use DataFrame.sort_index
:
q = pd.Series(dtype=int)
w = pd.Series(data=np.array([6,23,4])[::-1])
e = pd.Series(data=np.array([2,3,7,89,12,45])[::-1])
r = pd.Series(data=np.array([98,25,84,12])[::-1])
t = pd.Series(data=np.array([36,41,74,29,86,77,55])[::-1])
df = pd.concat([q, w,e,r,t], axis=1).sort_index(ascending=False, ignore_index=True)
print(df)
0 1 2 3 4
0 NaN NaN NaN NaN 36
1 NaN NaN 2.0 NaN 41
2 NaN NaN 3.0 NaN 74
3 NaN NaN 7.0 98.0 29
4 NaN 6.0 89.0 25.0 86
5 NaN 23.0 12.0 84.0 77
6 NaN 4.0 45.0 12.0 55
My solution (maybe not the best):
I would use shift function and apply it for each column.
But first we need to name cols properly as later indexing and shifting of values will be done by column names.
col_names = ["A","B","C","D","E"]
q.columns =col_names # renaming cols by list
cols_nan = q.isna().sum() # counting NaN values per each col and placing into pd.series / dataframe
df = q.apply(lambda x: x.shift(periods = cols_nan[x.name])) #using lambda to shift each columns by values in the cols_nan
out:
A B C D E
0 NaN NaN NaN NaN 36
1 NaN NaN 2.0 NaN 41
2 NaN NaN 3.0 NaN 74
3 NaN NaN 7.0 98.0 29
4 NaN 6.0 89.0 25.0 86
5 NaN 23.0 12.0 84.0 77
6 NaN 4.0 45.0 12.0 55
I don’t know how to move the nan present in each column to the beginning of each of it, translating the different values from nan to the bottom of the column.
import pandas as pd
import numpy as np
q = pd.Series(dtype=int)
w = pd.Series(data=np.array([6,23,4]))
e = pd.Series(data=np.array([2,3,7,89,12,45]))
r = pd.Series(data=np.array([98,25,84,12]))
t = pd.Series(data=np.array([36,41,74,29,86,77,55]))
q = pd.concat([q, w], axis=1)
q = pd.concat([q, e], axis=1)
q = pd.concat([q, r], axis=1)
q = pd.concat([q, t], axis=1)
print(q)
Would this make sense to you?
columns = [
[6, 23, 4],
[2, 3, 7, 89, 12, 45],
[98, 25, 84, 12],
[36, 41, 74, 29, 86, 77, 55]
]
max_length = max(len(column) for column in columns)
data = [
[None for _ in range(max_length - len(column))] + column
for column in columns
]
df = pd.DataFrame(data, dtype=int).T
print(df)
Result:
w e r t
0 NaN NaN NaN 36
1 NaN 2.0 NaN 41
2 NaN 3.0 NaN 74
3 NaN 7.0 98.0 29
4 6.0 89.0 25.0 86
5 23.0 12.0 84.0 77
6 4.0 45.0 12.0 55
Use DataFrame.sort_values
with key
parameter:
df = pd.concat([q, w,e,r,t], axis=1).apply(lambda x: x.sort_values(key=pd.notna).to_numpy())
print(df)
0 1 2 3 4
0 NaN NaN NaN NaN 36
1 NaN NaN 2.0 NaN 41
2 NaN NaN 3.0 NaN 74
3 NaN NaN 7.0 98.0 29
4 NaN 6.0 89.0 25.0 86
5 NaN 23.0 12.0 84.0 77
6 NaN 4.0 45.0 12.0 55
Another idea is swap order of arrays and after concat
use DataFrame.sort_index
:
q = pd.Series(dtype=int)
w = pd.Series(data=np.array([6,23,4])[::-1])
e = pd.Series(data=np.array([2,3,7,89,12,45])[::-1])
r = pd.Series(data=np.array([98,25,84,12])[::-1])
t = pd.Series(data=np.array([36,41,74,29,86,77,55])[::-1])
df = pd.concat([q, w,e,r,t], axis=1).sort_index(ascending=False, ignore_index=True)
print(df)
0 1 2 3 4
0 NaN NaN NaN NaN 36
1 NaN NaN 2.0 NaN 41
2 NaN NaN 3.0 NaN 74
3 NaN NaN 7.0 98.0 29
4 NaN 6.0 89.0 25.0 86
5 NaN 23.0 12.0 84.0 77
6 NaN 4.0 45.0 12.0 55
My solution (maybe not the best):
I would use shift function and apply it for each column.
But first we need to name cols properly as later indexing and shifting of values will be done by column names.
col_names = ["A","B","C","D","E"]
q.columns =col_names # renaming cols by list
cols_nan = q.isna().sum() # counting NaN values per each col and placing into pd.series / dataframe
df = q.apply(lambda x: x.shift(periods = cols_nan[x.name])) #using lambda to shift each columns by values in the cols_nan
out:
A B C D E
0 NaN NaN NaN NaN 36
1 NaN NaN 2.0 NaN 41
2 NaN NaN 3.0 NaN 74
3 NaN NaN 7.0 98.0 29
4 NaN 6.0 89.0 25.0 86
5 NaN 23.0 12.0 84.0 77
6 NaN 4.0 45.0 12.0 55