In list of items, remove items with a lower value compared to items with the same sub string in python

Question:

The title was hard to make and I don’t really know how to word this.

I have a list of items titled:

    [10998321023D123T][v0].jpg
    [10998321023D123T][v12321].jpg
    [10998321023D123T][v62221].jpg
    [10DFSA783212131T][v0].jpg
    [10DFSA783212131T][v32112].jpg
    [10DFSA783212131T][v54541].jpg

My goal is to remove the items with a lower version but not 0. So in this list I want to be left with

    [10998321023D123T][v0].jpg
    [10998321023D123T][v62221].jpg
    [10DFSA783212131T][v0].jpg
    [10DFSA783212131T][v54541].jpg

Im stuck with how I would do this. Any help would be amazing.

I thought of just making a new list for each item like a new list of all items containing "10998321023D123T" and remove the lower versions that don’t equal 0.
This would be slow and take up a lot of memory. Is there a better way to do this?

Asked By: Wamy-Dev

||

Answers:

You can use a dictionary to keep track of the highest version for each string and then iterate over the string and only extract the entries with the highest version or version 0. The code below assumes that v0 always exists and that the file format is always jpg, but can be modified if not.

files = [
    '[10998321023D123T][v0].jpg',
    '[10998321023D123T][v12321].jpg',
    '[10998321023D123T][v62221].jpg',
    '[10DFSA783212131T][v0].jpg',
    '[10DFSA783212131T][v32112].jpg',
    '[10DFSA783212131T][v54541].jpg',
]

highest_version = {}
for file in files:
    # extract string and version from file name
    string, version, extension = file.split(']') 
    string = string[1:]
    version = int(version[2:])

    if string not in highest_version or version>highest_version[string]:
        highest_version[string] = version

output = []
for string, version in highest_version.items():
    output.append('['+string+'][v0].jpg')
    output.append('['+string+'][v'+str(version)+'].jpg')

print(output)
Answered By: Yazeed Alnumay

So I imagine you want to look at the version numbers (v100101 ex.) and see if it’s smaller than some threshold.

If I understand correctly your data structure is a list of strings, where the name and version is seperated by blocks, followed by the extension.

So if the version numbers are fixed length you can do the following.

Set a threshold, then construct a new list for items lower than the threshold.

threshold: int = 54541

new_list: list[str] = []
for item in original:
  name, version, extension = item.split(“]”)

  name: str = name[1:]
  version: int = int(version[2:])
  extension: str = extension[1:]

  if version == 0 or version >= threshold:
    continue

  new_list.append(f”[{name}][v{version}].{extension}”)

You could also use list comprehension if you want to be fancy.

Also I would use some kind of a data structure (like a tuple) to store the name and version seperately.

Note that you can use other methods besides split, but I’m on my phone right now and cant remember it.

Hope this helps.

Answered By: Bálint Fodor

Consider each string in the list to be comprised of a key and a value where the key is the part between the first pair of square brackets and the value is in the second pair of square brackets.

For the value you can ignore the leading ‘v’.

Build a dictionary from the keys and their associated values.

Then work through the dictionary to reconstruct your list as follows:

_list = [
    '[10998321023D123T][v0].jpg',
    '[10998321023D123T][v12321].jpg',
    '[10998321023D123T][v62221].jpg',
    '[10DFSA783212131T][v0].jpg',
    '[10DFSA783212131T][v32112].jpg',
    '[10DFSA783212131T][v54541].jpg'
]

def get_key_and_value(s):
    k, v, *_ = s.split(']')
    return k, int(v[2:])

td = dict()

for e in _list:
    k, v = get_key_and_value(e)
    td.setdefault(k, []).append(v)

output = []

for k, v in td.items():
    for m in min(v), max(v):
        output.append(f'{k}][v{m}].jpg')

print(output)

Output:

['[10998321023D123T][v0].jpg', '[10998321023D123T][v62221].jpg', '[10DFSA783212131T][v0].jpg', '[10DFSA783212131T][v54541].jpg']
Answered By: DarkKnight

I will first divide the data into two categories, zero and non-zero versions, so that I will not compare them again, so that the RAM consumption will be less, and then I will put the other versions into the dictionary and compare them.
I hope this code helps you

data = [
"[10998321023D123T][v0].jpg",
"[10998321023D123T][v12321].jpg",
"[10998321023D123T][v62221].jpg",
"[10DFSA783212131T][v0].jpg",
"[10DFSA783212131T][v32112].jpg",
"[10DFSA783212131T][v54541].jpg",
]

all_data = []
collection_with_out_zero = {}

for i in data:
    code = i.split("]")[0].replace("[", "")
    current_version = int(i.split("]")[1].replace("[", "").replace("v", ""))

    if current_version == 0:
        all_data.append(i)
    elif exists_data_version := collection_with_out_zero.get(code, None):
        if exists_data_version <= current_version:
            collection_with_out_zero[code] = current_version
    else:
        collection_with_out_zero[code] = current_version

for code, version in collection_with_out_zero.items():
    all_data.append(f'[{code}][v{version}].jpg')

print(all_data)

output:

['[10998321023D123T][v0].jpg', '[10DFSA783212131T][v0].jpg', '[10998321023D123T][v62221].jpg', '[10DFSA783212131T][v54541].jpg']
Answered By: amir salmani
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.