Collapse Pandas rows to elliminate NaN entries

Question

Let’s consider the following DataFrame

Name	A	B	C	D
tom	10.0	NaN	NaN	NaN
tom	NaN	15.0	NaN	NaN
tom	NaN	NaN	20.0	NaN
tom	NaN	NaN	NaN	25.0
tom	30.0	NaN	NaN	NaN
tom	NaN	NaN	NaN	40.0
john	1.0	NaN	NaN	NaN
john	NaN	2.0	NaN	NaN
john	NaN	NaN	3.0	NaN
john	NaN	NaN	NaN	4.0
john	5.0	NaN	NaN	NaN
john	NaN	6.0	NaN	NaN
john	NaN	NaN	7.0	NaN
john	NaN	NaN	NaN	8.0

I want to collapse it to limit the amount of NaN values in the DataFrame – can be sequential, i.e. combine the neighboring rows if possible, but all I care about is that the values of columns A-D correspond to the same Name after the collapse

My perfect outcome would be

Name	A	B	C	D
tom	10.0	15.0	20.0	25.0
tom	30.0	NaN	NaN	40.0
john	1.0	2.0	3.0	4.0
john	5.0	6.0	7.0	8.0

From what I understand, Pandas groupby('Name') will not do the trick, because it will leave one entry for each name.

If that is of any help, I use a dictionary to create the dataframe. The dictionary looks like this:

{
    "a": {
        "tom": [10.0, 30.0],
        "john": [1.0, 5.0]
    },
    "b": {
        "tom": [15.0],
        "john": [2.0, 6.0]
    },
    .....
}

So, basically, I am taking every number in the dictionary then create a row with just this number, and then combine all of the rows.

Is there a simple way to collapse the resulting DataFrame or build a more compact DataFrame given such a dictionary

Asked By: Gabriel

||

Source

Answer 1

You can .groupby + .transform (where you "move" the values up). Then drop rows which contain all NaN values:

print(
    df.set_index("Name")
    .groupby(level=0)
    .transform(lambda x: sorted(x, key=lambda k: pd.isna(k)))
    .dropna(axis=0, how="all")
    .reset_index()
)

Prints:

   Name     A     B     C     D
0   tom  10.0  15.0  20.0  25.0
1   tom  30.0   NaN   NaN  40.0
2  john   1.0   2.0   3.0   4.0
3  john   5.0   6.0   7.0   8.0

Answered By: Andrej Kesely

Collapse Pandas rows to elliminate NaN entries

Question:

Answers: