Replace data in the whole Dataframe with a condition

Question:

I want to replace every element in a pandas dataframe with an empty string (all columns and all records) if they contain a question mark. I am curious what is a best solution for this.

What I thought of is to write a loop like this:

def modify_dataframe_line_by_line(df) -> None:

    for index, record in df.iterrows():
        for colname in df.columns.tolist():
            if "?" in record[colname]:
                record[colname] = ""

It works, but I assume this will be slow as hell with larger datasets.

I also tried this one but it does not work:

def df_loc_replace(df) -> None:

    for colname in df.columns.tolist():
        df.loc["?" in df[colname], colname] = ""

I also tried df.replace() but I did not find an option to add conditions to that (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html)

What is the best solution to this?

Asked By: Looz

||

Answers:

Try this:

import numpy as np
for colname in df.columns.tolist():
    df[colname] = np.where(df[colname].str.contains('?'), '', df[colname])
Answered By: gtomer
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.