Using regex in contains() to select rows from a pandas data frame having some string value (Capital or small)

Question

I want to extract rows from a pandas data frame based on the values of a column using regex in contains() method.

I am using the following code line to extract rows from a data frame if the ‘COMPTYPE’ column has any string value mentioned in contains() method

df = df[df['COMPTYPE'].astype(str).str.contains('MCCB|ACB|VCB|CONTACTOR', regex=True)]

It works fine however it’s not selecting those rows which have MccB or Vcb or Contactor or acb etc. values in the ‘COMPTYPE’ column.
How to use this command so it will take rows irrespective of the case of the string values.

Input:

BOARDIBNO	SUBCOMP_IBNO	COMPTYPE
1044444001	9044444001	ACB
1044444001	9044444002	Relay
1044444001	9044444003	Meters
1044444001	9044444004	MCCB/MPCB
1044444001	9044444005	vcb
1044444001	9044444006	MCCB/MPCB
1044444001	9044444007	acb
1044444001	9044444008	mccb
1044444001	9044444009	MCCB/MPCB
1044444001	9044444010	Power Contactor
1044444001	9044444011	Power Contactor
1044444001	9044444012	Control Contactor
1044444001	9044444013	VCB

Expected output is this,

BOARDIBNO	SUBCOMP_IBNO	COMPTYPE
1044444001	9044444001	ACB
1044444001	9044444004	MCCB/MPCB
1044444001	9044444005	vcb
1044444001	9044444006	MCCB/MPCB
1044444001	9044444007	acb
1044444001	9044444008	mccb
1044444001	9044444009	MCCB/MPCB
1044444001	9044444010	Power Contactor
1044444001	9044444011	Power Contactor
1044444001	9044444012	Control Contactor
1044444001	9044444013	VCB

However, I’m getting following output,

BOARDIBNO	SUBCOMP_IBNO	COMPTYPE
1044444001	9044444001	ACB
1044444001	9044444004	MCCB/MPCB
1044444001	9044444005	MCCB/MPCB
1044444001	9044444006	MCCB/MPCB
1044444001	9044444010	VCB

How to do it? Please help!

Asked By: Shraddha Jadhav

||

Source

Answer 1

Just use flags=re.IGNORECASE as parameter of str.contains or use case=False as suggested by @JoanLara:

import re
out = (df[df['COMPTYPE'].astype(str)
          .str.contains('MCCB|ACB|VCB|CONTACTOR', regex=True, flags=re.IGNORECASE)]

print(out)

# Output
     BOARDIBNO  SUBCOMP_IBNO           COMPTYPE
0   1044444001    9044444001                ACB
3   1044444001    9044444004          MCCB/MPCB
4   1044444001    9044444005                vcb
5   1044444001    9044444006          MCCB/MPCB
6   1044444001    9044444007                acb
7   1044444001    9044444008               mccb
8   1044444001    9044444009          MCCB/MPCB
9   1044444001    9044444010    Power Contactor
10  1044444001    9044444011    Power Contactor
11  1044444001    9044444012  Control Contactor
12  1044444001    9044444013                VCB

Or upper case the column before:

>>> out = df[df['COMPTYPE'].astype(str).str.upper()
             .str.contains('MCCB|ACB|VCB|CONTACTOR', regex=True)]

print(out)

# Output
     BOARDIBNO  SUBCOMP_IBNO           COMPTYPE
0   1044444001    9044444001                ACB
3   1044444001    9044444004          MCCB/MPCB
4   1044444001    9044444005                vcb
5   1044444001    9044444006          MCCB/MPCB
6   1044444001    9044444007                acb
7   1044444001    9044444008               mccb
8   1044444001    9044444009          MCCB/MPCB
9   1044444001    9044444010    Power Contactor
10  1044444001    9044444011    Power Contactor
11  1044444001    9044444012  Control Contactor
12  1044444001    9044444013                VCB

Answered By: Corralien

Using regex in contains() to select rows from a pandas data frame having some string value (Capital or small)

Question:

Answers: