Using regex in contains() to select rows from a pandas data frame having some string value (Capital or small)

Question:

I want to extract rows from a pandas data frame based on the values of a column using regex in contains() method.

I am using the following code line to extract rows from a data frame if the ‘COMPTYPE’ column has any string value mentioned in contains() method

df = df[df['COMPTYPE'].astype(str).str.contains('MCCB|ACB|VCB|CONTACTOR', regex=True)]

It works fine however it’s not selecting those rows which have MccB or Vcb or Contactor or acb etc. values in the ‘COMPTYPE’ column.
How to use this command so it will take rows irrespective of the case of the string values.

Input:

BOARDIBNO SUBCOMP_IBNO COMPTYPE
1044444001 9044444001 ACB
1044444001 9044444002 Relay
1044444001 9044444003 Meters
1044444001 9044444004 MCCB/MPCB
1044444001 9044444005 vcb
1044444001 9044444006 MCCB/MPCB
1044444001 9044444007 acb
1044444001 9044444008 mccb
1044444001 9044444009 MCCB/MPCB
1044444001 9044444010 Power Contactor
1044444001 9044444011 Power Contactor
1044444001 9044444012 Control Contactor
1044444001 9044444013 VCB

Expected output is this,

BOARDIBNO SUBCOMP_IBNO COMPTYPE
1044444001 9044444001 ACB
1044444001 9044444004 MCCB/MPCB
1044444001 9044444005 vcb
1044444001 9044444006 MCCB/MPCB
1044444001 9044444007 acb
1044444001 9044444008 mccb
1044444001 9044444009 MCCB/MPCB
1044444001 9044444010 Power Contactor
1044444001 9044444011 Power Contactor
1044444001 9044444012 Control Contactor
1044444001 9044444013 VCB

However, I’m getting following output,

BOARDIBNO SUBCOMP_IBNO COMPTYPE
1044444001 9044444001 ACB
1044444001 9044444004 MCCB/MPCB
1044444001 9044444005 MCCB/MPCB
1044444001 9044444006 MCCB/MPCB
1044444001 9044444010 VCB

How to do it? Please help!

Asked By: Shraddha Jadhav

||

Answers:

Just use flags=re.IGNORECASE as parameter of str.contains or use case=False as suggested by @JoanLara:

import re
out = (df[df['COMPTYPE'].astype(str)
          .str.contains('MCCB|ACB|VCB|CONTACTOR', regex=True, flags=re.IGNORECASE)]

print(out)

# Output
     BOARDIBNO  SUBCOMP_IBNO           COMPTYPE
0   1044444001    9044444001                ACB
3   1044444001    9044444004          MCCB/MPCB
4   1044444001    9044444005                vcb
5   1044444001    9044444006          MCCB/MPCB
6   1044444001    9044444007                acb
7   1044444001    9044444008               mccb
8   1044444001    9044444009          MCCB/MPCB
9   1044444001    9044444010    Power Contactor
10  1044444001    9044444011    Power Contactor
11  1044444001    9044444012  Control Contactor
12  1044444001    9044444013                VCB

Or upper case the column before:

>>> out = df[df['COMPTYPE'].astype(str).str.upper()
             .str.contains('MCCB|ACB|VCB|CONTACTOR', regex=True)]

print(out)

# Output
     BOARDIBNO  SUBCOMP_IBNO           COMPTYPE
0   1044444001    9044444001                ACB
3   1044444001    9044444004          MCCB/MPCB
4   1044444001    9044444005                vcb
5   1044444001    9044444006          MCCB/MPCB
6   1044444001    9044444007                acb
7   1044444001    9044444008               mccb
8   1044444001    9044444009          MCCB/MPCB
9   1044444001    9044444010    Power Contactor
10  1044444001    9044444011    Power Contactor
11  1044444001    9044444012  Control Contactor
12  1044444001    9044444013                VCB
Answered By: Corralien
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.