Pandas: Replace column values to empty if not present in pre-defined list
Question:
I have a list, X
, that contains a set of legal values for a column. Say, I have column A
. I want to replace (set to empty string) elements in df['A']
if their value is not in X. How can I do that efficiently in Pandas?
I know there is isin()
, but that just checks if the values are present and returns a Series of True/False.
Answers:
You can use the standard Pandas indexing here:
df.loc[~df.A.isin(X), 'A'] = ''
~df.A.isin(X)
– will revert the boolean Series returned by df.A.isin(X) (i.e. False
-> True
and True
-> False
)
You can do it with apply:
import pandas as pd
x = ['a', 'b', 'c']
data = {'foo':['a', 'a', 'q', 'p']}
df = pd.DataFrame.from_dict(data)
df_new = df['foo'].apply(lambda i: i if i in x else '')
I have a list, X
, that contains a set of legal values for a column. Say, I have column A
. I want to replace (set to empty string) elements in df['A']
if their value is not in X. How can I do that efficiently in Pandas?
I know there is isin()
, but that just checks if the values are present and returns a Series of True/False.
You can use the standard Pandas indexing here:
df.loc[~df.A.isin(X), 'A'] = ''
~df.A.isin(X)
– will revert the boolean Series returned by df.A.isin(X) (i.e. False
-> True
and True
-> False
)
You can do it with apply:
import pandas as pd
x = ['a', 'b', 'c']
data = {'foo':['a', 'a', 'q', 'p']}
df = pd.DataFrame.from_dict(data)
df_new = df['foo'].apply(lambda i: i if i in x else '')