How to extract value from re?

Question:

import re

cc = 'test 5555555555555555/03/22/284 test'

cc = re.findall('[0-9]{15,16}/[0-9]{2,4}/[0-9]{2,4}/[0-9]{3,4}', cc)

print(cc)

[5555555555555555/03/22/284]

This code is working fine but if i put 5555555555555555|03|22|284 on cc variable then this output will come:
[]

I want one condition if it contains ‘|’ then it gives output: 5555555555555555|03|22|284 or ‘/’ then also it will give output: 5555555555555555/03/22/284

Asked By: Gunnu Mittal

||

Answers:

Just replace all the /s in your regex (which incidentally don’t need to be backslashed) with [/|], which matches either a / or a |. Or if you want backslashes, too, as in your comment on Zain‘s answer, [/|\]. (You should always use raw strings r'...' for regexes since they have their own interpretation of backslashes; in a regular string, [/|\] would have to be written [/|\\].)

match = re.findall(
        r'[0-9]{15,16}[/|\][0-9]{2,4}[/|\][0-9]{2,4}[/|\][0-9]{3,4}',
        cc)

Any other characters you want to include, like colons, can likewise be added between the square brackets.

If you want to accept repeated characters – and treat them as a single delimiter – you can add + to accept "1 or more" of any of the characters:

match = re.findall(
        r'[0-9]{15,16}[:/|\]+[0-9]{2,4}[:/|\]+[0-9]{2,4}[:/|\]+[0-9]{3,4}',
        cc)

But that will accept, for example, 555555555555555:/|\03::|::22\//284 as valid. If you want to be pickier you can replace the character class with a set of alternates, which can be any length. Just separate the options via | – note that outside of the square brackets, a literal | needs a backslash – and put (?:) around the whole thing: (?:/|\|||:|...) whatever, in place of the square-bracketed expressions up there.

I don’t recommend assigning the result of the findall back to the original cc variable; for one thing, it’s a list, not a string. (You can get the string with e.g. new_cc = match[0]).

Better to create a new variable so (1) you still have the original value in case you need it and (2) when you use the new value in later code, it’s clear that it’s different.

In fact, if you’re going to the trouble of matching this pattern, you might as well go ahead and extract all the components of it at the same time. Just put () around the bits you want to keep, and they’ll be put in a tuple as the result of that match:

import re
pat = re.compile(r'([0-9]{15,16})[:/|\]+([0-9]{2,4})[:/|\]+([0-9]{2,4})[:/|\]+([0-9]{3,4})')
cc = 'test 5555555555555555/03/22/284 test'
match, = pat.findall(cc)
print(match)

Which outputs this:

('5555555555555555', '03', '22', '284')
Answered By: Mark Reed

Define both options in re to let your string work with both e.g. the following RE used checks for both "" and also "|" in the string

import re

cc = 'test 5555555555555555/03/22/284 test'
#cc = 'test 5555555555555555|03|22|284 test'

cc = re.findall('[0-9]{15,16}[/|][0-9]{2,4}[/|][0-9]{2,4}[/|][0-9]{3,4}', cc)


print(cc)
Answered By: Zain Ul Abidin
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.