How to check the string format of an entire column in Python using regex

Question:

I have Account Names which look like GH85036, LG95639, etc in a column. I want to check the format of the entire columns so I can edit the ones that don’t follow the format. This is my first time using regex.

So far I have got

for i in Reports['Account Name']:

 match = re.findall(r'[A-Z]{2}[0-9][0-9][0-9][0-9][0-9]', Reports['Account Name']) is None

The error message I get:

<ipython-input-77-86f17b9d34ff> in <module>()
      1 for i in Reports['Account Name']:
----> 2     match = re.findall(r'[A-Z]{2}[0-9][0-9][0-9][0-9][0-9]', Reports['Account Name']) is None

C:Program FilesAnaconda3libre.py in findall(pattern, string, flags)
    221 
    222     Empty matches are included in the result."""
--> 223     return _compile(pattern, flags).findall(string)
    224 
    225 def finditer(pattern, string, flags=0):

TypeError: expected string or bytes-like object
Asked By: simplemind

||

Answers:

Assuming the correct/acceptable account number be two capital letters followed by 5 digits, we can use str.contains on the entire column to flag any non matching values:

Reports[~Reports["Account Name"].str.contains(r'^[A-Z]{2}[0-9]{5}$', regex=True)]
Answered By: Tim Biegeleisen
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.