How to delete rows from df1 where the column2 value doesn't appear in df2col1

Question:

df1 for example is

col1 col2 col3
abcdef ghijkl mnopqr
abcdef1 ghijkl1 mnopqr1

df2 is

col1
ghijkl1

essentially I want to delete rows from df1 where the column2 value doesn’t appear in df2col1

the final df1 would be:

col1 col2 col3
abcdef1 ghijkl1 mnopqr1
Asked By: plerpsandplerps

||

Answers:

Make sure you don’t have whitespace at the start or end of tha values inside your DataFrame or column names, as the guys pointed out above. i will try to show how to do it in this answer.

# striping whitespace from col names for both DataFrames

df1.rename(columns=lambda x: x.strip(), inplace = True)
df2.rename(columns=lambda x: x.strip(), inplace = True)

# striping whitespace from values in both DataFrames

df1 = df1.apply(lambda x: x.str.strip())
df2 = df2.apply(lambda x: x.str.strip())

# droping rows from df1 where the col2 value doesn't appear in df2 col1

mask = df1["col2"].isin(df2["col1"]) # returns a series of boolean mask(True or False, for the condition we are looking for)

new_df = df1[mask] # Returns what you are asking for :D

Hope this helps.

Answered By: Echo
indexes = df1[!df1.col2.isin(df2.col2)].index
df1 = df1.drop(indexes)
Answered By: the_ordinary_guy
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.