select between lists elements with regex
Question:
I have 2 lists:
images = ['ND_row_id_4026.png',
'ND_row_id_7693.png',
'ND_row_id_5285.png',
'ND_row_id_1045.png',
'ND_row_id_2135.png',
'ND_row_id_8155.png',
'ND_row_id_3135.png']
masks = ['MA_row_id_4026.png',
'MA_row_id_7693.png',
'MA_row_id_5285.png',
'MA_row_id_1045.png',
'MA_row_id_2135.png']
I want to keep the masks
files. So, I have 2 folders with images and masks and I want to keep all the masks that are in the masks
list and the corresponding images. The images list is bigger than the masks.
So ,I want to keep from the images list:
images = ['ND_row_id_4026.png',
'ND_row_id_7693.png',
'ND_row_id_5285.png',
'ND_row_id_1045.png',
'ND_row_id_2135.png']
Answers:
It seems (based on your sample data) a simple list comprehension should suffice, just replacing the prefix for comparison:
res = [im for im in images if im.replace('ND', 'MA') in masks]
Output (for your sample data):
[
'ND_row_id_4026.png',
'ND_row_id_7693.png',
'ND_row_id_5285.png',
'ND_row_id_1045.png',
'ND_row_id_2135.png'
]
If your lists are large, it would be more efficient to pre-process the masks array (as it is the smaller of the two):
m2 = [ma.replace('MA', 'ND') for ma in masks]
res = [im for im in images if im in m2]
Try this
images = [image for image in images if image in [x.replace('MA','ND') for x in masks]]
Explanation:
[...]
: make list
[image for image in images]
: iterate along images list
... if image in [..]
: filter the result and only return those who are in the other list.
[x.replace('MA','ND') for x in masks]
: make list where MA
is replaced by ND
in mask
May be this could help you … written using set method
Code :
import re
images = ['ND_row_id_4026.png',
'ND_row_id_7693.png',
'ND_row_id_5285.png',
'ND_row_id_1045.png',
'ND_row_id_2135.png',
'ND_row_id_8155.png',
'ND_row_id_3135.png']
masks = ['MA_row_id_4026.png',
'MA_row_id_7693.png',
'MA_row_id_5285.png',
'MA_row_id_1045.png',
'MA_row_id_2135.png']
# Fetching all digits from image file names
forimages = set( j[0] for j in [ re.findall(r'd+',val) for val in images ])
# Fetching all digits from mask file names
formasks = set( j[0] for j in [ re.findall(r'd+',val) for val in masks ])
# Applying intersection logic on digit to get comma digit between both
my_mask_data = [ 'ND_row_id_{0}.png'.format(i) for i in sorted(forimages.intersection(formasks)) ]
# Printing the output
print(my_mask_data)
Output :
['ND_row_id_1045.png', 'ND_row_id_2135.png', 'ND_row_id_4026.png', 'ND_row_id_5285.png', 'ND_row_id_7693.png']
Incase if you find any issue in above … feel free to change it
I have 2 lists:
images = ['ND_row_id_4026.png',
'ND_row_id_7693.png',
'ND_row_id_5285.png',
'ND_row_id_1045.png',
'ND_row_id_2135.png',
'ND_row_id_8155.png',
'ND_row_id_3135.png']
masks = ['MA_row_id_4026.png',
'MA_row_id_7693.png',
'MA_row_id_5285.png',
'MA_row_id_1045.png',
'MA_row_id_2135.png']
I want to keep the masks
files. So, I have 2 folders with images and masks and I want to keep all the masks that are in the masks
list and the corresponding images. The images list is bigger than the masks.
So ,I want to keep from the images list:
images = ['ND_row_id_4026.png',
'ND_row_id_7693.png',
'ND_row_id_5285.png',
'ND_row_id_1045.png',
'ND_row_id_2135.png']
It seems (based on your sample data) a simple list comprehension should suffice, just replacing the prefix for comparison:
res = [im for im in images if im.replace('ND', 'MA') in masks]
Output (for your sample data):
[
'ND_row_id_4026.png',
'ND_row_id_7693.png',
'ND_row_id_5285.png',
'ND_row_id_1045.png',
'ND_row_id_2135.png'
]
If your lists are large, it would be more efficient to pre-process the masks array (as it is the smaller of the two):
m2 = [ma.replace('MA', 'ND') for ma in masks]
res = [im for im in images if im in m2]
Try this
images = [image for image in images if image in [x.replace('MA','ND') for x in masks]]
Explanation:
[...]
: make list
[image for image in images]
: iterate along images list
... if image in [..]
: filter the result and only return those who are in the other list.
[x.replace('MA','ND') for x in masks]
: make list where MA
is replaced by ND
in mask
May be this could help you … written using set method
Code :
import re
images = ['ND_row_id_4026.png',
'ND_row_id_7693.png',
'ND_row_id_5285.png',
'ND_row_id_1045.png',
'ND_row_id_2135.png',
'ND_row_id_8155.png',
'ND_row_id_3135.png']
masks = ['MA_row_id_4026.png',
'MA_row_id_7693.png',
'MA_row_id_5285.png',
'MA_row_id_1045.png',
'MA_row_id_2135.png']
# Fetching all digits from image file names
forimages = set( j[0] for j in [ re.findall(r'd+',val) for val in images ])
# Fetching all digits from mask file names
formasks = set( j[0] for j in [ re.findall(r'd+',val) for val in masks ])
# Applying intersection logic on digit to get comma digit between both
my_mask_data = [ 'ND_row_id_{0}.png'.format(i) for i in sorted(forimages.intersection(formasks)) ]
# Printing the output
print(my_mask_data)
Output :
['ND_row_id_1045.png', 'ND_row_id_2135.png', 'ND_row_id_4026.png', 'ND_row_id_5285.png', 'ND_row_id_7693.png']
Incase if you find any issue in above … feel free to change it