How to get data from list of dict on basis of string

Question:

I have a list of dictionaries that looks like this:

data = [{'name': 'root/folder1/asd/file.csv'},
        {'name': 'root/folder1/bsd/file.csv'}, 
        {'name': 'root/folder1/folder2/folder3/new.csv'}, 
        {'name': 'root/folder1/folder2/folder3'}]

I want to take this list and pare it down to include only files that have a certain extension and exist in the shallowest folder in the list. That is, if a path has two / in it, and all other paths have at least two, then filter out all paths that have more than two. If the smallest number of slashes is three, then filter out anything with more than three.

This is what I started with:

for path in data:
    if path.get('name').endswith('.exe') or path.endswith('.csv'):
       path_count = len(re.findall('/', path)) 
       path.update({'path_count': path_count})

Now I will check minimum count by again applying a for loop. Is there a cleaner way to do this?

Asked By: newbiee

||

Answers:

It’s a little unclear what you’re asking, but you can leverage list comprehensions and their filters to get a list of objects that meet some arbitrary requirements:

extensions = {"exe", "csv"}

def ext_filter(path: str) -> bool:
    ext = path.split(".")[-1]
    return ext and ext in extensions

def slash_filter(path: str, count: int) -> bool:
    return ext_filter(path) and count == path.count("/")

slash_count = min((path.get("name", "").count("/") for path in data if ext_filter(path.get("name"))))

valid_paths = [path for path in data if slash_filter(path.get("name", ""), slash_count)]
Answered By: Nathaniel Ford
data = [{'name': 'root/folder1/asd/file.csv'},
        {'name': 'root/folder1/bsd/file.csv'}, 
        {'name': 'root/folder1/folder2/folder3/new.csv'}, 
        {'name': 'root/folder1/folder2/folder3'}]


min_count = min(list(
                    # mapping lambda count on the pathnames in 
                    map(lambda x: list(x.values()).pop().count('/'), 
                    # the generator expression for dictionaries in the list
                    (d for d in data))))

for el in data:
    for key, item in el.items():
        if item.count('/') == min_count and item.endswith(('.exe', '.csv')):
            print(f'key {key} item {item}')
                  
                  key name item root/folder1/asd/file.csv
                  key name item root/folder1/bsd/file.csv
Answered By: LetzerWille

If time of running the code matters you can do it in one go, controlling that you include only shortest paths.
For example:

results = []
for item in data:
    path = item['name']
    if path.endswith('exe') or path.endswith('csv'):
        path_count = path.count('/')
        if len(results) == 0:
            results.append(path)
            n = path_count
        elif path_count == n:
            results.append(path)
        elif path_count < n:
            results = [path]
            n = path_count
Answered By: tench11
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.