Failure when filtering string list with re.match
Question:
I’d like to filter a list of strings in python by using regex. In the following case, keeping only the files with a ‘.npy’ extension.
The code that doesn’t work:
import re
files = [ '/a/b/c/la_seg_x005_y003.png',
'/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.png',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.png',
'/a/b/c/la_seg_x003_y003.npy', ]
regex = re.compile(r'_xd+_yd+.npy')
selected_files = filter(regex.match, files)
print(selected_files)
The same regex works for me in Ruby:
selected = files.select { |f| f =~ /_xd+_yd+.npy/ }
What’s wrong with the Python code?
Answers:
Just use search
– since match starts matching from the beginning to end (i.e. entire) of string and search matches anywhere in the string.
import re
files = [ '/a/b/c/la_seg_x005_y003.png',
'/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.png',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.png',
'/a/b/c/la_seg_x003_y003.npy', ]
regex = re.compile(r'_xd+_yd+.npy')
selected_files = filter(regex.search, files)
print(selected_files)
Output-
['/a/b/c/la_seg_x005_y003.npy', '/a/b/c/la_seg_x004_y003.npy', '/a/b/c/la_seg_x003_y003.npy']
If you match, the pattern must cover the entire input.
Either extend you regular expression:
regex = re.compile(r'.*_xd+_yd+.npy')
Which would match:
['/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.npy']
Or use re.search, which
scans through string looking for the first location where the regular expression pattern produces a match […]
re.match()
looks for a match at the beginning of the string. You can use re.search()
instead.
selected_files = filter(regex.match, files)
re.match('regex')
equals to re.search('^regex')
or text.startswith('regex')
but regex version. It only checks if the string starts with the regex.
So, use re.search()
instead:
import re
files = [ '/a/b/c/la_seg_x005_y003.png',
'/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.png',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.png',
'/a/b/c/la_seg_x003_y003.npy', ]
regex = re.compile(r'_xd+_yd+.npy')
selected_files = list(filter(regex.search, files))
# The list call is only required in Python 3, since filter was changed to return a generator
print(selected_files)
Output:
['/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.npy']
And if you just want to get all of the .npy
files, str.endswith()
would be a better choice:
files = [ '/a/b/c/la_seg_x005_y003.png',
'/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.png',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.png',
'/a/b/c/la_seg_x003_y003.npy', ]
selected_files = list(filter(lambda x: x.endswith('.npy'), files))
print(selected_files)
I’d like to filter a list of strings in python by using regex. In the following case, keeping only the files with a ‘.npy’ extension.
The code that doesn’t work:
import re
files = [ '/a/b/c/la_seg_x005_y003.png',
'/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.png',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.png',
'/a/b/c/la_seg_x003_y003.npy', ]
regex = re.compile(r'_xd+_yd+.npy')
selected_files = filter(regex.match, files)
print(selected_files)
The same regex works for me in Ruby:
selected = files.select { |f| f =~ /_xd+_yd+.npy/ }
What’s wrong with the Python code?
Just use search
– since match starts matching from the beginning to end (i.e. entire) of string and search matches anywhere in the string.
import re
files = [ '/a/b/c/la_seg_x005_y003.png',
'/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.png',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.png',
'/a/b/c/la_seg_x003_y003.npy', ]
regex = re.compile(r'_xd+_yd+.npy')
selected_files = filter(regex.search, files)
print(selected_files)
Output-
['/a/b/c/la_seg_x005_y003.npy', '/a/b/c/la_seg_x004_y003.npy', '/a/b/c/la_seg_x003_y003.npy']
If you match, the pattern must cover the entire input.
Either extend you regular expression:
regex = re.compile(r'.*_xd+_yd+.npy')
Which would match:
['/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.npy']
Or use re.search, which
scans through string looking for the first location where the regular expression pattern produces a match […]
re.match()
looks for a match at the beginning of the string. You can use re.search()
instead.
selected_files = filter(regex.match, files)
re.match('regex')
equals to re.search('^regex')
or text.startswith('regex')
but regex version. It only checks if the string starts with the regex.
So, use re.search()
instead:
import re
files = [ '/a/b/c/la_seg_x005_y003.png',
'/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.png',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.png',
'/a/b/c/la_seg_x003_y003.npy', ]
regex = re.compile(r'_xd+_yd+.npy')
selected_files = list(filter(regex.search, files))
# The list call is only required in Python 3, since filter was changed to return a generator
print(selected_files)
Output:
['/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.npy']
And if you just want to get all of the .npy
files, str.endswith()
would be a better choice:
files = [ '/a/b/c/la_seg_x005_y003.png',
'/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.png',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.png',
'/a/b/c/la_seg_x003_y003.npy', ]
selected_files = list(filter(lambda x: x.endswith('.npy'), files))
print(selected_files)