Failure when filtering string list with re.match

Question:

I’d like to filter a list of strings in python by using regex. In the following case, keeping only the files with a ‘.npy’ extension.

The code that doesn’t work:

import re

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]

regex = re.compile(r'_xd+_yd+.npy')

selected_files = filter(regex.match, files)
print(selected_files)

The same regex works for me in Ruby:

selected = files.select { |f| f =~ /_xd+_yd+.npy/ }

What’s wrong with the Python code?

Asked By: miluz

||

Answers:

Just use search– since match starts matching from the beginning to end (i.e. entire) of string and search matches anywhere in the string.

import re

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]

regex = re.compile(r'_xd+_yd+.npy')

selected_files = filter(regex.search, files)
print(selected_files)

Output-

['/a/b/c/la_seg_x005_y003.npy', '/a/b/c/la_seg_x004_y003.npy', '/a/b/c/la_seg_x003_y003.npy']
Answered By: SIslam

If you match, the pattern must cover the entire input.
Either extend you regular expression:

regex = re.compile(r'.*_xd+_yd+.npy')

Which would match:

['/a/b/c/la_seg_x005_y003.npy',
 '/a/b/c/la_seg_x004_y003.npy',
 '/a/b/c/la_seg_x003_y003.npy']

Or use re.search, which

scans through string looking for the first location where the regular expression pattern produces a match […]

Answered By: miku

re.match() looks for a match at the beginning of the string. You can use re.search() instead.

Answered By: Vlad
selected_files = filter(regex.match, files)

re.match('regex') equals to re.search('^regex') or text.startswith('regex') but regex version. It only checks if the string starts with the regex.

So, use re.search() instead:

import re

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]

regex = re.compile(r'_xd+_yd+.npy')

selected_files = list(filter(regex.search, files))
# The list call is only required in Python 3, since filter was changed to return a generator
print(selected_files)

Output:

['/a/b/c/la_seg_x005_y003.npy',
 '/a/b/c/la_seg_x004_y003.npy',
 '/a/b/c/la_seg_x003_y003.npy']

And if you just want to get all of the .npy files, str.endswith() would be a better choice:

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]


selected_files = list(filter(lambda x: x.endswith('.npy'), files))

print(selected_files)
Answered By: Remi Guan
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.