Python: remove duplicate in json from 2 key value
Question:
I have a json file organised like the following one and I would like to delete all duplicated from 2 key pairs element
[{'name': 'anna', 'city': 'paris','code': '5'},
{'name': 'anna', 'city': 'paris','code': '2'},
{'name': 'henry', 'city': 'london','code': '1'},
{'name': 'henry', 'city': 'london','code': '3'},...]
expected outpout
[{'name': 'anna', 'city': 'paris'},{'name': 'henry', 'city': 'london'}]
I am struggling with this task, any ideas?
Answers:
you need to make a unique key for (name, city) and for record whose have same pair just need to apply the condition of what to keep in the final result.
once done, get the values and that is the answer.
with walrus
operator and dict-comprehension
>>> l = [{'name': 'anna', 'city': 'paris', 'code': '5'}, {'name': 'anna', 'city': 'paris', 'code': '2'}, {'name': 'henry', 'city': 'london', 'code': '1'}, {'name': 'henry', 'city': 'london', 'code': '3'}]
>>> result = { (name:= subdict['name'], city:= subdict['city']): dict(name=name, city=city) for subdict in l}
>>> result
{('anna', 'paris'): {'name': 'anna', 'city': 'paris'}, ('henry', 'london'): {'name': 'henry', 'city': 'london'}}
>>> solution = list(result.values())
>>> solution
[{'name': 'anna', 'city': 'paris'}, {'name': 'henry', 'city': 'london'}]
In pure python you can select what you need in dictionnary rows, use set collection of hashable row like tuple (with {}) and then rebuild your rows with what you selected
items = [
{'name': 'anna', 'city': 'paris','code': '5'},
{'name': 'anna', 'city': 'paris','code': '2'},
{'name': 'henry', 'city': 'london','code': '1'},
{'name': 'henry', 'city': 'london','code': '3'}
]
unique = {(item["name"], item["city"]) for item in items}
unique = [{"name": item[0], "city": item[1]} for item in unique]
Here’s another approach (one of many) that utilises a set as follows:
input_list = [
{'name': 'anna', 'city': 'paris', 'code': '5'},
{'name': 'anna', 'city': 'paris', 'code': '2'},
{'name': 'henry', 'city': 'london', 'code': '1'},
{'name': 'henry', 'city': 'london', 'code': '3'}
]
output_list = []
unique_names = set()
for d in input_list:
if (name := d.get('name')) not in unique_names:
output_list.append({k: v for k, v in d.items() if k != 'code'})
unique_names.add(name)
print(output_list)
Output:
[{'name': 'anna', 'city': 'paris'}, {'name': 'henry', 'city': 'london'}]
Note:
There’s at least one benefit of doing it this way. Other answers are building the new dictionaries to include keys ‘name’ and ‘city’ and implicitly ignore ‘code’ which is fine for the data as shown. However, this approach builds the new dictionaries excluding ‘code’. What this means is that the dictionary structures (the input data) can change without having to alter the functional code – i.e., the ‘code’ key could be absent and key/value pairs in addition to ‘name’ and ‘city’ could be introduced
I have a json file organised like the following one and I would like to delete all duplicated from 2 key pairs element
[{'name': 'anna', 'city': 'paris','code': '5'},
{'name': 'anna', 'city': 'paris','code': '2'},
{'name': 'henry', 'city': 'london','code': '1'},
{'name': 'henry', 'city': 'london','code': '3'},...]
expected outpout
[{'name': 'anna', 'city': 'paris'},{'name': 'henry', 'city': 'london'}]
I am struggling with this task, any ideas?
you need to make a unique key for (name, city) and for record whose have same pair just need to apply the condition of what to keep in the final result.
once done, get the values and that is the answer.
with walrus
operator and dict-comprehension
>>> l = [{'name': 'anna', 'city': 'paris', 'code': '5'}, {'name': 'anna', 'city': 'paris', 'code': '2'}, {'name': 'henry', 'city': 'london', 'code': '1'}, {'name': 'henry', 'city': 'london', 'code': '3'}]
>>> result = { (name:= subdict['name'], city:= subdict['city']): dict(name=name, city=city) for subdict in l}
>>> result
{('anna', 'paris'): {'name': 'anna', 'city': 'paris'}, ('henry', 'london'): {'name': 'henry', 'city': 'london'}}
>>> solution = list(result.values())
>>> solution
[{'name': 'anna', 'city': 'paris'}, {'name': 'henry', 'city': 'london'}]
In pure python you can select what you need in dictionnary rows, use set collection of hashable row like tuple (with {}) and then rebuild your rows with what you selected
items = [
{'name': 'anna', 'city': 'paris','code': '5'},
{'name': 'anna', 'city': 'paris','code': '2'},
{'name': 'henry', 'city': 'london','code': '1'},
{'name': 'henry', 'city': 'london','code': '3'}
]
unique = {(item["name"], item["city"]) for item in items}
unique = [{"name": item[0], "city": item[1]} for item in unique]
Here’s another approach (one of many) that utilises a set as follows:
input_list = [
{'name': 'anna', 'city': 'paris', 'code': '5'},
{'name': 'anna', 'city': 'paris', 'code': '2'},
{'name': 'henry', 'city': 'london', 'code': '1'},
{'name': 'henry', 'city': 'london', 'code': '3'}
]
output_list = []
unique_names = set()
for d in input_list:
if (name := d.get('name')) not in unique_names:
output_list.append({k: v for k, v in d.items() if k != 'code'})
unique_names.add(name)
print(output_list)
Output:
[{'name': 'anna', 'city': 'paris'}, {'name': 'henry', 'city': 'london'}]
Note:
There’s at least one benefit of doing it this way. Other answers are building the new dictionaries to include keys ‘name’ and ‘city’ and implicitly ignore ‘code’ which is fine for the data as shown. However, this approach builds the new dictionaries excluding ‘code’. What this means is that the dictionary structures (the input data) can change without having to alter the functional code – i.e., the ‘code’ key could be absent and key/value pairs in addition to ‘name’ and ‘city’ could be introduced