Replace data in the whole Dataframe with a condition
Question:
I want to replace every element in a pandas dataframe with an empty string (all columns and all records) if they contain a question mark. I am curious what is a best solution for this.
What I thought of is to write a loop like this:
def modify_dataframe_line_by_line(df) -> None:
for index, record in df.iterrows():
for colname in df.columns.tolist():
if "?" in record[colname]:
record[colname] = ""
It works, but I assume this will be slow as hell with larger datasets.
I also tried this one but it does not work:
def df_loc_replace(df) -> None:
for colname in df.columns.tolist():
df.loc["?" in df[colname], colname] = ""
I also tried df.replace()
but I did not find an option to add conditions to that (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html)
What is the best solution to this?
Answers:
Try this:
import numpy as np
for colname in df.columns.tolist():
df[colname] = np.where(df[colname].str.contains('?'), '', df[colname])
I want to replace every element in a pandas dataframe with an empty string (all columns and all records) if they contain a question mark. I am curious what is a best solution for this.
What I thought of is to write a loop like this:
def modify_dataframe_line_by_line(df) -> None:
for index, record in df.iterrows():
for colname in df.columns.tolist():
if "?" in record[colname]:
record[colname] = ""
It works, but I assume this will be slow as hell with larger datasets.
I also tried this one but it does not work:
def df_loc_replace(df) -> None:
for colname in df.columns.tolist():
df.loc["?" in df[colname], colname] = ""
I also tried df.replace()
but I did not find an option to add conditions to that (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html)
What is the best solution to this?
Try this:
import numpy as np
for colname in df.columns.tolist():
df[colname] = np.where(df[colname].str.contains('?'), '', df[colname])