Filter pandas dataframe by list
Question:
I have a dataframe that has a row called "Hybridization REF". I would like to filter so that I only get the data for the items that have the same label as one of the items in my list.
Basically, I’d like to do the following:
dataframe[dataframe["Hybridization REF"].apply(lambda: x in list)]
but that syntax is not correct.
Answers:
Suppose
df
is your dataframe
,
lst
is our list
of labels.
df.loc[ df.index.isin(lst), : ]
Will display all rows whose index matches any value of the list item. I hope this helps solve your query.
Is there a numpy dataframe? I am guessing it is pandas dataframe, if so here is the solution.
df[df['Hybridization REF'].isin(list)]
Update using reindex,
df.reindex(collist, axis=1)
and
df.reindex(rowlist, axis=0)
and both:
df.reindex(index=rowlist, columns=collist)
You can use .loc or column filtering:
df = pd.DataFrame(data=np.random.rand(5,5),columns=list('ABCDE'),index=list('abcde'))
df
A B C D E
a 0.460537 0.174788 0.167554 0.298469 0.630961
b 0.728094 0.275326 0.405864 0.302588 0.624046
c 0.953253 0.682038 0.802147 0.105888 0.089966
d 0.122748 0.954955 0.766184 0.410876 0.527166
e 0.227185 0.449025 0.703912 0.617826 0.037297
collist = ['B','D','E']
rowlist = ['a','c']
Get columns in list:
df[collist]
Output:
B D E
a 0.174788 0.298469 0.630961
b 0.275326 0.302588 0.624046
c 0.682038 0.105888 0.089966
d 0.954955 0.410876 0.527166
e 0.449025 0.617826 0.037297
Get rows in list
df.loc[rowlist]
A B C D E
a 0.460537 0.174788 0.167554 0.298469 0.630961
c 0.953253 0.682038 0.802147 0.105888 0.089966
You can try the following:
df.loc[ df.index.intersection(lst), : ]
This way you only get the intersection
Another alternative is to use query
:
df.query('`Hybridization REF` == @list')
The `
‘s before and after Hybridization REF
are needed due to the whitespace in the column name. With @
you can access the variable list
.
Keep in mind that Python’s built-in list type is named list. So it is a good idea to rename this variable.
Same code with this correction should work.
dataframe[dataframe["Hybridization REF"].apply(lambda x : x in list)]
For future reference, if you are looking to match just a sub portion of your string you can also use:
new_df = df.loc[df.index.str.contains('sub_string_you_need'), :]
I have a dataframe that has a row called "Hybridization REF". I would like to filter so that I only get the data for the items that have the same label as one of the items in my list.
Basically, I’d like to do the following:
dataframe[dataframe["Hybridization REF"].apply(lambda: x in list)]
but that syntax is not correct.
Suppose
df
is your dataframe
,
lst
is our list
of labels.
df.loc[ df.index.isin(lst), : ]
Will display all rows whose index matches any value of the list item. I hope this helps solve your query.
Is there a numpy dataframe? I am guessing it is pandas dataframe, if so here is the solution.
df[df['Hybridization REF'].isin(list)]
Update using reindex,
df.reindex(collist, axis=1)
and
df.reindex(rowlist, axis=0)
and both:
df.reindex(index=rowlist, columns=collist)
You can use .loc or column filtering:
df = pd.DataFrame(data=np.random.rand(5,5),columns=list('ABCDE'),index=list('abcde'))
df
A B C D E
a 0.460537 0.174788 0.167554 0.298469 0.630961
b 0.728094 0.275326 0.405864 0.302588 0.624046
c 0.953253 0.682038 0.802147 0.105888 0.089966
d 0.122748 0.954955 0.766184 0.410876 0.527166
e 0.227185 0.449025 0.703912 0.617826 0.037297
collist = ['B','D','E']
rowlist = ['a','c']
Get columns in list:
df[collist]
Output:
B D E
a 0.174788 0.298469 0.630961
b 0.275326 0.302588 0.624046
c 0.682038 0.105888 0.089966
d 0.954955 0.410876 0.527166
e 0.449025 0.617826 0.037297
Get rows in list
df.loc[rowlist]
A B C D E
a 0.460537 0.174788 0.167554 0.298469 0.630961
c 0.953253 0.682038 0.802147 0.105888 0.089966
You can try the following:
df.loc[ df.index.intersection(lst), : ]
This way you only get the intersection
Another alternative is to use query
:
df.query('`Hybridization REF` == @list')
The `
‘s before and after Hybridization REF
are needed due to the whitespace in the column name. With @
you can access the variable list
.
Keep in mind that Python’s built-in list type is named list. So it is a good idea to rename this variable.
Same code with this correction should work.
dataframe[dataframe["Hybridization REF"].apply(lambda x : x in list)]
For future reference, if you are looking to match just a sub portion of your string you can also use:
new_df = df.loc[df.index.str.contains('sub_string_you_need'), :]