How to filter list of items based on substring?

Question:

I have a list which has collection of filepaths and i want to extract the paths which only contains 'mp4'.

lists = ['/Users/me/1. intro.mp4', 'The mp4 version.vlc'
         '/Users/2. intro.vtt', '/Users/1. ppt.rar', '/Users/2. ppt.mp4']

Expected output:

['/Users/me/1. intro.mp4', 'The mp4 version.vlc','/Users/2. ppt.mp4']

I tried the below code but its not exactly giving me the correct output. My code looks:

lists = ['/Users/me/1. intro.mp4',
         '/Users/2. intro.vtt', '/Users/1. ppt.rar', '/Users/2. ppt.mp4']


def Filter(string, substr):
    return [str for str in string if
            any(sub in str for sub in substr)]


searchString = 'mp4'
result = Filter(lists, searchString)
print(f'{result}')

If I run the program, it gives me the following output:

['/Users/me/1. intro.mp4', '/Users/1. ppt.rar', '/Users/2. ppt.mp4']

Can anybody tell me how to fix?

Asked By: Funny Boss

||

Answers:

Try This:

lists = ['/Users/me/1. intro.mp4',
         '/Users/2. intro.vtt', '/Users/1. ppt.rar', '/Users/2. ppt.mp4']

def filterSubstr(lists, substr):
    return [x for x in lists if substr in x]

searchString = 'mp4'
print(filterSubstr(lists, searchString))

Result:

['/Users/me/1. intro.mp4', '/Users/2. ppt.mp4']
Answered By: Wriar

You just need to check if substr is in each item in the list.

def Filter(string, substr):
    return [item for item in string if substr in item]

Your code, i.e.

any(sub in str for sub in substr)

checks if ANY of the letters ‘m’, ‘p’, or ‘4’ are in str, since you have a nested comprehension that iterates through each character in substr itself.

I would also not use ‘str‘ as a variable name as you’ve done, since it’s used for the built-in str class.

Answered By: thariqfahry

I would suggest using the pathlib module which makes it easy to actually check the file’s extension — which is a more rigorous test than merely whether the one string is a substring of another:

from pathlib import Path


file_paths = ['/Users/me/1. intro.mp4', '/Users/2. intro.vtt', '/Users/1. ppt.rar',
              '/Users/2. ppt.mp4']

def filter_on_extension(paths, ext):
    return [path for path in paths if Path(path).suffix == ext]

file_extension = '.mp4'
result = filter_on_extension(file_paths, file_extension)
print(result)  # -> ['/Users/me/1. intro.mp4', '/Users/2. ppt.mp4']
Answered By: martineau
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.