Store the values for each key as an array in a dictionary

Question:

I would like to normalize all values in the dictionary data and store them again in another dictionary with the same keys and for each key the values should be store in 1D array so I did the following:

>>> data = {1: [0.6065306597126334], 2: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 3: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 4: [0.6065306597126334, 0.6065306597126334]}

>>> norm = {k: [v / sum(vals) for v in vals] for k, vals in data.items()} 

>>> norm
{1: [1], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}

Now suppose the dictionary data contains only a zero value for one of it’s keys like the value of the first key 1:

>>> data = {1: [0.0], 2: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 3: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 4: [0.6065306597126334, 0.6065306597126334]}

then normalizing the values of this dictionary will result by [nan] values because of the division by zero

>>> norm = {k: [v / sum(vals) for v in vals] for k, vals in data.items()}

__main__:1: RuntimeWarning: invalid value encountered in double_scalars
>>> norm
{1: [nan], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}

So I inserted an if statement to overcome this issue but I can’t store the values for each key as a ID array

the code

>>> norm = {}
>>> for k, vals in data.items():
...     values = []
...     if sum(vals) == 0:
...        values.append(list(vals))
...     else:
...          for v in vals:
...              values.append(list([v/sum(vals)]))
...     norm[k]=values
... 
>>> norm
{1: [[1.0]], 2: [[0.4498162176582741], [0.4498162176582741], [0.10036756468345168]], 3: [[0.4498162176582741], [0.4498162176582741], [0.10036756468345168]], 4: [[0.5], [0.5]]}

I would like to get the norm dictionary as

norm = {1: [1.0], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}

Also, For the dictionary data, while it contains a zero value for one if it’s keys, is there a better solution to normalize it because I think that my solution is not efficient!

P.S: I tried at the end of the for loop norm[k]= np.array(values) instead of norm[k]=values but the result was not as required.

Asked By: Noah16

||

Answers:

append as mentioned above adds an element to a list, and this element can be a list, that’s why you currently have a list within a list. Ideally, you should be using extend which concatenates the first list with another list.

Answered By: walugembe peter

As mentioned in an answer, extend can be used to solve your problem. If you do want to use append, you could take the first element of your lists.

norm = {}
for k, vals in data.items():
    values = []
    if sum(vals) == 0:
        values.append(vals[0])
    else:
        for v in vals:
            values.append([v / sum(vals)][0])
    norm[k] = values

See difference between append vs extend list methods in python for an example of append vs extend

As for the optimization. Completely removing the for loops won’t be possible but you can shortify your solution, while still maintaining readability:

norm = {}
for k, vals in data.items():
    if sum(vals) == 0:
        norm[k] = vals
    else:
        norm[k] = [x / sum(vals) for x in vals]
Answered By: DSC

Your dict/list comprehension fails when sum(vals) == 0:

>>> data = {1: [0.0], 2: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 3: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 4: [0.6065306597126334, 0.6065306597126334]}
>>> {k: [v / sum(vals) for v in vals] for k, vals in data.items()}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <dictcomp>
  File "<stdin>", line 1, in <listcomp>
ZeroDivisionError: float division by zero

You can introduce a ternary expression to handle the case:

>>> {k: [v / sum(vals) if sum(vals)!=0 else 1.0 for v in vals] for k, vals in data.items()}
{1: [1.0], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}

If you want to avoid to evalaute sum(vals) multiple times:

>>> {k: [v / s if s!=0 else 1.0 for v in vals] for k,vals,s in ((k, vals, sum(vals)) for k, vals in data.items())}
{1: [1.0], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}

((k, vals, sum(vals)) for k, vals in data.items()) is a generator that returns k, vals and sum(vals) for every item.

Answered By: jferard

This should work as well:

norm = {k: [v / sum(vals) for v in vals] if sum(vals)!=0 else [1] for k, vals in data.items() }
Answered By: obd