how to remove dictionary element by outlier values Python

Question:

Suppose my dictionary contains > 100 elements and one or two elements have values different than other values; most values are the same (12 in the below example). How can I remove these a few elements?

Diction = {1:12,2:12,3:23,4:12,5:12,6:12,7:12,8:2}

I want a dictionary object:

Diction = {1:12,2:12,4:12,5:12,6:12,7:12}
Asked By: Z. Zhang

||

Answers:

d = {1:12,2:12,3:23,4:12,5:12,6:12,7:12,8:2}
new_d = {}

unique_values = []
unique_count = []
most_occurence = 0

# Find unique values
for k, v in d.items():
    if v not in  unique_values:
        unique_values.append(v)

# Count their occurrences
def count(dict, unique_value):
    count = 0
    for k, v in d.items():
        if v == unique_value:
            count +=1

    return count

for value in unique_values:
    occurrences = count(d, value)
    unique_count.append( (value, occurrences) )

# Find which value has most occurences
for occurrence in unique_count:
    if occurrence[1] > most_occurence:
        most_occurence = occurrence[0]

# Create new dict with keys of most occurred value
for k, v in d.items():
    if v == most_occurence:
        new_d[k] = v

print(new_d)

Nothing fancy, but direct to the point. There should be many ways to optimize this.

Output: {1: 12, 2: 12, 4: 12, 5: 12, 6: 12, 7: 12}
Answered By: Niko

It may be a bit slow because of the looping (especially as the size of the dictionary gets very large) and have to use numpy, but this will work

import numpy as np

Diction = {1:12,2:12,3:23,4:12,5:12,6:12,7:12,8:2}

dict_list = []
for x in Diction:
    dict_list.append(Diction[x])
    
dict_array = np.array(dict_list)
unique, counts = np.unique(dict_array, return_counts=True)
most_common = unique[np.argmax(counts)]

new_Diction = {}
for x in Diction:
    if Diction[x] == most_common:
        new_Diction[x] = most_common
        
print(new_Diction)

Output

{1: 12, 2: 12, 4: 12, 5: 12, 6: 12, 7: 12}
Answered By: greenerpastures
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.