Check if certain value is contained in a dataframe column in pandas

Question:

I am trying to check if a certain value is contained in a python column. I’m using df.date.isin(['07311954']), which I do not doubt to be a good tool. The problem is that I have over 350K rows and the output won’t show
all of them so that I can see if the value is actually contained. Put simply, I just want to know (Y/N) whether or not a specific value is contained in a column. My code follows:

import numpy as np
import pandas as pd
import glob


df = (pd.read_csv('/home/jayaramdas/anaconda3/Thesis/FEC_data/itpas2_data/itpas214.txt',
    sep='|', header=None, low_memory=False, names=['1', '2', '3', '4', '5', '6', '7', 
    '8', '9', '10', '11', '12', '13', 'date', '15', '16', '17', '18', '19', '20', 
    '21', '22']))

df.date.isin(['07311954'])
Asked By: Collective Action

||

Answers:

I think you need str.contains, if you need rows where values of column date contains string 07311954:

print df[df['date'].astype(str).str.contains('07311954')]

Or if type of date column is string:

print df[df['date'].str.contains('07311954')]

If you want check last 4 digits for string 1954 in column date:

print df[df['date'].astype(str).str[-4:].str.contains('1954')]

Sample:

print df['date']
0    8152007
1    9262007
2    7311954
3    2252011
4    2012011
5    2012011
6    2222011
7    2282011
Name: date, dtype: int64

print df['date'].astype(str).str[-4:].str.contains('1954')
0    False
1    False
2     True
3    False
4    False
5    False
6    False
7    False
Name: date, dtype: bool

print df[df['date'].astype(str).str[-4:].str.contains('1954')]
     cmte_id trans_typ entity_typ state  employer  occupation     date  
2  C00119040       24K        CCM    MD       NaN         NaN  7311954   

   amount     fec_id    cand_id  
2    1000  C00140715  H2MD05155  
Answered By: jezrael

You can use any:

print any(df.column == 07311954)
True       #true if it contains the number, false otherwise

If you rather want to see how many times ‘07311954’ occurs in a column you can use:

df.column[df.column == 07311954].count()
Answered By: Deusdeorum

You can simply use this:

'07311954' in df.date.values which returns True or False


Here is the further explanation:

In pandas, using in check directly with DataFrame and Series (e.g. val in df or val in series ) will check whether the val is contained in the Index.

BUT you can still use in check for their values too (instead of Index)! Just using val in df.col_name.values
or val in series.values. In this way, you are actually checking the val with a Numpy array.

And .isin(vals) is the other way around, it checks whether the DataFrame/Series values are in the vals. Here vals must be set or list-like. So this is not the natural way to go for the question.

Answered By: YaOzI
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.