How to avoid repeating adding elements to a list?

Question:

How can I add unique file paths to the groups_of_files list while avoiding duplication due to the cycles in my loop?

for file in files_names:
for name_group, formats in groups_of_format.items():
    if file.split('.')[-1].upper() in groups_of_format.values():
        groups_of_files[groups_of_format.keys()].append(file)
Asked By: Pain

||

Answers:

Use sets instead of lists. Elements in sets are kept unique using an hash.

Something like:

groups_of_files = defaultdict(set)
for file in files_names:
  for name_group, formats in groups_of_format.items():
    if file.split('.')[-1].upper() in groups_of_format.values():
      groups_of_files[groups_of_format.keys()].add(file)

I assumed that groups_of_files is a dictionary. In the code example, when the element of the dictionary is missing, instead of raising exceptions, the element is created and the value is an empty set to which you can add your file. If file is of a custom type, make sure to define the __hash__ and the __eq__ methods.

If in the end you need anyway a list, you can convert a set to a list just using list() and the set as the argument.

Answered By: Riccardo Petraglia

You can use a set to keep track of the files that have already been added to the groups_of_files list.

added_files = set()
for file in files_names:
    for name_group, formats in groups_of_format.items():
        if file.split('.')[-1].upper() in formats and file not in added_files:
            groups_of_files[name_group].append(file)
            added_files.add(file)

Answered By: Razvan I.

Build a dictionary keyed on the filename extensions. Associated values should be a set.

Subsequently, build the required dictionary by converting the sets to lists as follows:

import os

temp = dict()

files_names = ['a.txt', 'b.txt', 'b.txt', 'c.py', 'e.txt', 'f.py']

for file in files_names:
    _, ext = os.path.splitext(file)
    temp.setdefault(ext.upper()[1:], set()).add(file)

groups_of_files = {k: list(v) for k, v in temp.items()}

print(groups_of_files)

Output:

{'TXT': ['e.txt', 'b.txt', 'a.txt'], 'PY': ['c.py', 'f.py']}
Answered By: Pingu
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.