Comparing Pandas Dataframe column with List
Question:
I am trying to compare a Pandas Dataframe with a List. I have extracted IDs to a list, called list_x
;
Since I have several rows with the same ID, this is reflected on the list. i.e list_x = [1,1,1,1,2,3, etc.]
I am trying to drop all dataframe entries that have an ID that is also in the list
what I have been trying are variations of:
for j in range(len(dataframe)-1):
if dataframe.loc(j,"ID") in list_x: dataframe.drop([j], inplace = True)
or variations of
for j in range(len(dataframe)-1):
for k in range(len(list_x)-1):
if dataframe.loc(j,"ID") in list_x[k]: dataframe.drop([j], inplace = True)
I get an error which I think comes from the fact I am comparing the list’s index with the dataframe, and not the actual list entry.
Any help would be appreciated.
Answers:
You want to get the dataframe without rows associated to IDs in list_x.
So you can go for this :
# your df (2 columns : ID and value)
df = pd.DataFrame({'ID': [1,3,5,6,7], 'value' : ['red', 'blue', 'green', 'orange', 'purple']})
# the list of IDs you don't want in your the dataframe
list_x = [1,1,2,3,5]
# the output
df = df[~df.ID.isin(list_x)]
I am trying to compare a Pandas Dataframe with a List. I have extracted IDs to a list, called list_x
;
Since I have several rows with the same ID, this is reflected on the list. i.e list_x = [1,1,1,1,2,3, etc.]
I am trying to drop all dataframe entries that have an ID that is also in the list
what I have been trying are variations of:
for j in range(len(dataframe)-1):
if dataframe.loc(j,"ID") in list_x: dataframe.drop([j], inplace = True)
or variations of
for j in range(len(dataframe)-1):
for k in range(len(list_x)-1):
if dataframe.loc(j,"ID") in list_x[k]: dataframe.drop([j], inplace = True)
I get an error which I think comes from the fact I am comparing the list’s index with the dataframe, and not the actual list entry.
Any help would be appreciated.
You want to get the dataframe without rows associated to IDs in list_x.
So you can go for this :
# your df (2 columns : ID and value)
df = pd.DataFrame({'ID': [1,3,5,6,7], 'value' : ['red', 'blue', 'green', 'orange', 'purple']})
# the list of IDs you don't want in your the dataframe
list_x = [1,1,2,3,5]
# the output
df = df[~df.ID.isin(list_x)]