select between lists elements with regex

Question:

I have 2 lists:

images = ['ND_row_id_4026.png',
          'ND_row_id_7693.png',
          'ND_row_id_5285.png',
          'ND_row_id_1045.png',
          'ND_row_id_2135.png',
          'ND_row_id_8155.png',
          'ND_row_id_3135.png']

masks = ['MA_row_id_4026.png',
         'MA_row_id_7693.png',
         'MA_row_id_5285.png',
         'MA_row_id_1045.png',
         'MA_row_id_2135.png']

I want to keep the masks files. So, I have 2 folders with images and masks and I want to keep all the masks that are in the masks list and the corresponding images. The images list is bigger than the masks.

So ,I want to keep from the images list:

images = ['ND_row_id_4026.png',
          'ND_row_id_7693.png',
          'ND_row_id_5285.png',
          'ND_row_id_1045.png',
          'ND_row_id_2135.png']
Asked By: George

||

Answers:

It seems (based on your sample data) a simple list comprehension should suffice, just replacing the prefix for comparison:

res = [im for im in images if im.replace('ND', 'MA') in masks]

Output (for your sample data):

[
 'ND_row_id_4026.png',
 'ND_row_id_7693.png',
 'ND_row_id_5285.png',
 'ND_row_id_1045.png',
 'ND_row_id_2135.png'
]

If your lists are large, it would be more efficient to pre-process the masks array (as it is the smaller of the two):

m2 = [ma.replace('MA', 'ND') for ma in masks]
res = [im for im in images if im in m2]
Answered By: Nick

Try this

images = [image for image in images if image in [x.replace('MA','ND') for x in masks]]

Explanation:

[...]: make list

[image for image in images]: iterate along images list

... if image in [..]: filter the result and only return those who are in the other list.

[x.replace('MA','ND') for x in masks]: make list where MA is replaced by ND in mask

Answered By: Sandwichnick

May be this could help you … written using set method

Code :

import re

images = ['ND_row_id_4026.png',
          'ND_row_id_7693.png',
          'ND_row_id_5285.png',
          'ND_row_id_1045.png',
          'ND_row_id_2135.png',
          'ND_row_id_8155.png',
          'ND_row_id_3135.png']

masks = ['MA_row_id_4026.png',
         'MA_row_id_7693.png',
         'MA_row_id_5285.png',
         'MA_row_id_1045.png',
         'MA_row_id_2135.png']

# Fetching all digits from image file names  
forimages = set( j[0] for j in [ re.findall(r'd+',val) for val in images ])

# Fetching all digits from mask file names
formasks =  set( j[0] for j in [ re.findall(r'd+',val) for val in masks ])

# Applying intersection logic on digit to get comma digit between both 
my_mask_data = [ 'ND_row_id_{0}.png'.format(i) for i in sorted(forimages.intersection(formasks)) ]

# Printing the output 
print(my_mask_data)  

Output :

['ND_row_id_1045.png', 'ND_row_id_2135.png', 'ND_row_id_4026.png', 'ND_row_id_5285.png', 'ND_row_id_7693.png']

Incase if you find any issue in above … feel free to change it

Answered By: codeholic24
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.