How to method-chain `ffill(axis=1)` in a dataframe

Question:

I would like to fill column b of a dataframe with values from a in case b is nan, and I would like to do it in a method chain, but I cannot figure out how to do this.

The following works

import numpy as np
import pandas as pd

df = pd.DataFrame(
    {"a": [1, 2, 3, 4], "b": [10, np.nan, np.nan, 40], "c": ["a", "b", "c", "d"]}
)
df["b"] = df[["a", "b"]].ffill(axis=1)["b"]
print(df.to_markdown())

|    |   a |   b | c   |
|---:|----:|----:|:----|
|  0 |   1 |  10 | a   |
|  1 |   2 |   2 | b   |
|  2 |   3 |   3 | c   |
|  3 |   4 |  40 | d   |

but is not method-chained. Thanks a lot for the help!

Asked By: divingTobi

||

Answers:

df = pd.DataFrame({"a": [1, 2, 3, 4], "b": [10, np.nan, np.nan, 40], "c": ["a", "b", "c", "d"]})
df['b'] = df.b.fillna(df.a)
    
|    |   a |   b | c   |
|---:|----:|----:|:----|
|  0 |   1 |  10 | a   |
|  1 |   2 |   2 | b   |
|  2 |   3 |   3 | c   |
|  3 |   4 |  40 | d   |
Answered By: crashMOGWAI

One solution I have found is by using the pyjanitor library:

import pandas as pd
import pyjanitor 

df = pd.DataFrame(
    {"a": [1, 2, 3, 4], "b": [10, np.nan, np.nan, 40], "c": ["a", "b", "c", "d"]}
)
df.case_when(
    lambda x: x["b"].isna(), lambda x: x["a"], lambda x: x["b"], column_name="b"
)

Here, the case_when(...) can be integrated into a chain of manipulations and we still keep the whole dataframe in the chain.

I wonder how this could be accomplished without pyjanitor.

Answered By: divingTobi

This replaces NA in column df.b with values from df.a using fillna instead of ffill:

import numpy as np
import pandas as pd

df = (
    pd.DataFrame({"a": [1, 2, 3, 4], "b": [10, np.nan, np.nan, 40], "c": ["a", "b", "c", "d"]})
    .assign(b=lambda x: x.b.fillna(df.a))
)
display(df)
df.dtypes

Output:

enter image description here

Answered By: packoman
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.