In list of items, remove items with a lower value compared to items with the same sub string in python
Question:
The title was hard to make and I don’t really know how to word this.
I have a list of items titled:
[10998321023D123T][v0].jpg
[10998321023D123T][v12321].jpg
[10998321023D123T][v62221].jpg
[10DFSA783212131T][v0].jpg
[10DFSA783212131T][v32112].jpg
[10DFSA783212131T][v54541].jpg
My goal is to remove the items with a lower version but not 0. So in this list I want to be left with
[10998321023D123T][v0].jpg
[10998321023D123T][v62221].jpg
[10DFSA783212131T][v0].jpg
[10DFSA783212131T][v54541].jpg
Im stuck with how I would do this. Any help would be amazing.
I thought of just making a new list for each item like a new list of all items containing "10998321023D123T" and remove the lower versions that don’t equal 0.
This would be slow and take up a lot of memory. Is there a better way to do this?
Answers:
You can use a dictionary to keep track of the highest version for each string and then iterate over the string and only extract the entries with the highest version or version 0. The code below assumes that v0
always exists and that the file format is always jpg, but can be modified if not.
files = [
'[10998321023D123T][v0].jpg',
'[10998321023D123T][v12321].jpg',
'[10998321023D123T][v62221].jpg',
'[10DFSA783212131T][v0].jpg',
'[10DFSA783212131T][v32112].jpg',
'[10DFSA783212131T][v54541].jpg',
]
highest_version = {}
for file in files:
# extract string and version from file name
string, version, extension = file.split(']')
string = string[1:]
version = int(version[2:])
if string not in highest_version or version>highest_version[string]:
highest_version[string] = version
output = []
for string, version in highest_version.items():
output.append('['+string+'][v0].jpg')
output.append('['+string+'][v'+str(version)+'].jpg')
print(output)
So I imagine you want to look at the version numbers (v100101 ex.) and see if it’s smaller than some threshold.
If I understand correctly your data structure is a list of strings, where the name and version is seperated by blocks, followed by the extension.
So if the version numbers are fixed length you can do the following.
Set a threshold, then construct a new list for items lower than the threshold.
threshold: int = 54541
new_list: list[str] = []
for item in original:
name, version, extension = item.split(“]”)
name: str = name[1:]
version: int = int(version[2:])
extension: str = extension[1:]
if version == 0 or version >= threshold:
continue
new_list.append(f”[{name}][v{version}].{extension}”)
You could also use list comprehension if you want to be fancy.
Also I would use some kind of a data structure (like a tuple) to store the name and version seperately.
Note that you can use other methods besides split, but I’m on my phone right now and cant remember it.
Hope this helps.
Consider each string in the list to be comprised of a key and a value where the key is the part between the first pair of square brackets and the value is in the second pair of square brackets.
For the value you can ignore the leading ‘v’.
Build a dictionary from the keys and their associated values.
Then work through the dictionary to reconstruct your list as follows:
_list = [
'[10998321023D123T][v0].jpg',
'[10998321023D123T][v12321].jpg',
'[10998321023D123T][v62221].jpg',
'[10DFSA783212131T][v0].jpg',
'[10DFSA783212131T][v32112].jpg',
'[10DFSA783212131T][v54541].jpg'
]
def get_key_and_value(s):
k, v, *_ = s.split(']')
return k, int(v[2:])
td = dict()
for e in _list:
k, v = get_key_and_value(e)
td.setdefault(k, []).append(v)
output = []
for k, v in td.items():
for m in min(v), max(v):
output.append(f'{k}][v{m}].jpg')
print(output)
Output:
['[10998321023D123T][v0].jpg', '[10998321023D123T][v62221].jpg', '[10DFSA783212131T][v0].jpg', '[10DFSA783212131T][v54541].jpg']
I will first divide the data into two categories, zero and non-zero versions, so that I will not compare them again, so that the RAM consumption will be less, and then I will put the other versions into the dictionary and compare them.
I hope this code helps you
data = [
"[10998321023D123T][v0].jpg",
"[10998321023D123T][v12321].jpg",
"[10998321023D123T][v62221].jpg",
"[10DFSA783212131T][v0].jpg",
"[10DFSA783212131T][v32112].jpg",
"[10DFSA783212131T][v54541].jpg",
]
all_data = []
collection_with_out_zero = {}
for i in data:
code = i.split("]")[0].replace("[", "")
current_version = int(i.split("]")[1].replace("[", "").replace("v", ""))
if current_version == 0:
all_data.append(i)
elif exists_data_version := collection_with_out_zero.get(code, None):
if exists_data_version <= current_version:
collection_with_out_zero[code] = current_version
else:
collection_with_out_zero[code] = current_version
for code, version in collection_with_out_zero.items():
all_data.append(f'[{code}][v{version}].jpg')
print(all_data)
output:
['[10998321023D123T][v0].jpg', '[10DFSA783212131T][v0].jpg', '[10998321023D123T][v62221].jpg', '[10DFSA783212131T][v54541].jpg']
The title was hard to make and I don’t really know how to word this.
I have a list of items titled:
[10998321023D123T][v0].jpg
[10998321023D123T][v12321].jpg
[10998321023D123T][v62221].jpg
[10DFSA783212131T][v0].jpg
[10DFSA783212131T][v32112].jpg
[10DFSA783212131T][v54541].jpg
My goal is to remove the items with a lower version but not 0. So in this list I want to be left with
[10998321023D123T][v0].jpg
[10998321023D123T][v62221].jpg
[10DFSA783212131T][v0].jpg
[10DFSA783212131T][v54541].jpg
Im stuck with how I would do this. Any help would be amazing.
I thought of just making a new list for each item like a new list of all items containing "10998321023D123T" and remove the lower versions that don’t equal 0.
This would be slow and take up a lot of memory. Is there a better way to do this?
You can use a dictionary to keep track of the highest version for each string and then iterate over the string and only extract the entries with the highest version or version 0. The code below assumes that v0
always exists and that the file format is always jpg, but can be modified if not.
files = [
'[10998321023D123T][v0].jpg',
'[10998321023D123T][v12321].jpg',
'[10998321023D123T][v62221].jpg',
'[10DFSA783212131T][v0].jpg',
'[10DFSA783212131T][v32112].jpg',
'[10DFSA783212131T][v54541].jpg',
]
highest_version = {}
for file in files:
# extract string and version from file name
string, version, extension = file.split(']')
string = string[1:]
version = int(version[2:])
if string not in highest_version or version>highest_version[string]:
highest_version[string] = version
output = []
for string, version in highest_version.items():
output.append('['+string+'][v0].jpg')
output.append('['+string+'][v'+str(version)+'].jpg')
print(output)
So I imagine you want to look at the version numbers (v100101 ex.) and see if it’s smaller than some threshold.
If I understand correctly your data structure is a list of strings, where the name and version is seperated by blocks, followed by the extension.
So if the version numbers are fixed length you can do the following.
Set a threshold, then construct a new list for items lower than the threshold.
threshold: int = 54541
new_list: list[str] = []
for item in original:
name, version, extension = item.split(“]”)
name: str = name[1:]
version: int = int(version[2:])
extension: str = extension[1:]
if version == 0 or version >= threshold:
continue
new_list.append(f”[{name}][v{version}].{extension}”)
You could also use list comprehension if you want to be fancy.
Also I would use some kind of a data structure (like a tuple) to store the name and version seperately.
Note that you can use other methods besides split, but I’m on my phone right now and cant remember it.
Hope this helps.
Consider each string in the list to be comprised of a key and a value where the key is the part between the first pair of square brackets and the value is in the second pair of square brackets.
For the value you can ignore the leading ‘v’.
Build a dictionary from the keys and their associated values.
Then work through the dictionary to reconstruct your list as follows:
_list = [
'[10998321023D123T][v0].jpg',
'[10998321023D123T][v12321].jpg',
'[10998321023D123T][v62221].jpg',
'[10DFSA783212131T][v0].jpg',
'[10DFSA783212131T][v32112].jpg',
'[10DFSA783212131T][v54541].jpg'
]
def get_key_and_value(s):
k, v, *_ = s.split(']')
return k, int(v[2:])
td = dict()
for e in _list:
k, v = get_key_and_value(e)
td.setdefault(k, []).append(v)
output = []
for k, v in td.items():
for m in min(v), max(v):
output.append(f'{k}][v{m}].jpg')
print(output)
Output:
['[10998321023D123T][v0].jpg', '[10998321023D123T][v62221].jpg', '[10DFSA783212131T][v0].jpg', '[10DFSA783212131T][v54541].jpg']
I will first divide the data into two categories, zero and non-zero versions, so that I will not compare them again, so that the RAM consumption will be less, and then I will put the other versions into the dictionary and compare them.
I hope this code helps you
data = [
"[10998321023D123T][v0].jpg",
"[10998321023D123T][v12321].jpg",
"[10998321023D123T][v62221].jpg",
"[10DFSA783212131T][v0].jpg",
"[10DFSA783212131T][v32112].jpg",
"[10DFSA783212131T][v54541].jpg",
]
all_data = []
collection_with_out_zero = {}
for i in data:
code = i.split("]")[0].replace("[", "")
current_version = int(i.split("]")[1].replace("[", "").replace("v", ""))
if current_version == 0:
all_data.append(i)
elif exists_data_version := collection_with_out_zero.get(code, None):
if exists_data_version <= current_version:
collection_with_out_zero[code] = current_version
else:
collection_with_out_zero[code] = current_version
for code, version in collection_with_out_zero.items():
all_data.append(f'[{code}][v{version}].jpg')
print(all_data)
output:
['[10998321023D123T][v0].jpg', '[10DFSA783212131T][v0].jpg', '[10998321023D123T][v62221].jpg', '[10DFSA783212131T][v54541].jpg']