How do I fill a dictionary with indices in a for loop?

Question:

I have a transposed Dataframe tr:

7128 8719 14051 14636
JDUTC_0 2451957.36 2452149.36 2457243.98 2452531.89
JDUTC_1 2451957.37 2452149.36 2457243.99 2452531.90
JDUTC_2 2451957.37 2452149.36 2457244.00 2452531.91
JDUTC_3 NaN 2452149.36 NaN NaN
JDUTC_4 NaN 2452149.36 NaN NaN
JDUTC_5 NaN 2452149.36 NaN NaN
JDUTC_6 1.23 2452149.37 NaN NaN
JDUTC_7 NaN NaN NaN NaN
JDUTC_8 NaN NaN NaN NaN
JDUTC_9 NaN NaN NaN NaN

And I create dict ‘a’ with this block of code:

a = {}
b=[]
for _, contents in tr.items():
    b.clear()
    for ind, val in enumerate(contents):
        if np.isnan(val):
            b.append(ind)
            continue
        else:
            pass
    print(_)
    print(b)
    a[_] = b
    print(a)

Which gives me this output:

7128
[3, 4, 5, 7, 8, 9]
{7128: [3, 4, 5, 7, 8, 9]}
8719
[7, 8, 9]
{7128: [7, 8, 9], 8719: [7, 8, 9]}
14051
[3, 4, 5, 6, 7, 8, 9]
{7128: [3, 4, 5, 6, 7, 8, 9], 8719: [3, 4, 5, 6, 7, 8, 9], 14051: [3, 4, 5, 6, 7, 8, 9]}
14636
[3, 4, 5, 6, 7, 8, 9]
{7128: [3, 4, 5, 6, 7, 8, 9], 8719: [3, 4, 5, 6, 7, 8, 9], 14051: [3, 4, 5, 6, 7, 8, 9], 
14636: [3, 4, 5, 6, 7, 8, 9]}

What I expect dict ‘a’ to look like is this:

{7128: [3, 4, 5, 7, 8, 9]
 8719: [7, 8, 9]
14051: [3, 4, 5, 6, 7, 8, 9]
14636: [3, 4, 5, 6, 7, 8, 9]}

What I am doing wrong? Why is a[_] = b overwriting all the previous keys when print(_) is verifying that _ is always the next column label?

Asked By: Jonathan Sullivan

||

Answers:

The problem is you are assigning same list to all keys.

a = {}
b=[] # < --- You create one Array/list 'b'
for _, contents in tr.items():
    b.clear()
    for ind, val in enumerate(contents):
        if np.isnan(val):
            b.append(ind)
            continue
        else:
            pass
    print(_)
    print(b)
    a[_] = b # <-- assign same array to all keys.
    print(a)

Check my comment on the code above.

b.clear()

This line just clears the same array, it does not create a new array.

To run the code as you intended, create a new array/list in side the loop.

a = {}
for _, contents in tr.items():
    b = [] # <--- new array/list is created
    for ind, val in enumerate(contents):
        if np.isnan(val):
            b.append(ind)
            continue
        else:
            pass
    print(_)
    print(b)
    a[_] = b # <--- Now you assign the new array 'b' to a[_]
    print(a)
Answered By: jkhadka

With the correct name convention, I would change your code
after:

import numpy as np
import pandas as pd

import sys
if sys.version_info[0] < 3:
    from StringIO import StringIO
else:
    from io import StringIO

s = StringIO("""idx 7128    8719    14051   14636
JDUTC_0 2451957.36  2452149.36  2457243.98  2452531.89
JDUTC_1 2451957.37  2452149.36  2457243.99  2452531.90
JDUTC_2 2451957.37  2452149.36  2457244.00  2452531.91
JDUTC_3 NaN 2452149.36  NaN NaN
JDUTC_4 NaN 2452149.36  NaN NaN
JDUTC_5 NaN 2452149.36  NaN NaN
JDUTC_6 1.23    2452149.37  NaN NaN
JDUTC_7 NaN NaN NaN NaN
JDUTC_8 NaN NaN NaN NaN
JDUTC_9 NaN NaN NaN NaN""")

tr = pd.read_csv(s, sep="t", index_col=0)

(people should give minimal working code – but often forget to give e.g. the code to build the data frame etc. and the imports)

to:



a = {}
b = []
for name, values in tr.items():
    b.clear() # this is problematic as you know
    for ind, val in enumerate(values):
        if np.isnan(val):
            b.append(ind)
            continue
        else:
            pass
    a[name] = b

continue and pass are not necessary – they just say "go on" with the loop.
In Python, you are not forced to give the else branch:

for name, values in tr.items():
    b.clear() # This is still problematic at this state.
    for ind, val in enumerate(values):
        if np.isnan(val):
            b.append(ind)
    a[name] = b

Such collection of data using for-loops are better done with list-comprehensions:

a = {}
for name, values in tr.items():
    b = [ind for ind, val in enumerate(values) if np.isnan(val)]
    a[name] = b
# now the result is already correct!

And finally, you can even build list-comprehensions for dictionaries –
making this entire code a one-liner – but a readable one – when one is familiar with list comprehensions:

a = {name: [i for i, x in enumerate(vals) if np.isnan(x)] for name, vals in tr.items()}

You can see the result:

a
# which returns:
{'7128': [3, 4, 5, 7, 8, 9],
 '8719': [7, 8, 9],
 '14051': [3, 4, 5, 6, 7, 8, 9],
 '14636': [3, 4, 5, 6, 7, 8, 9]}

List-comprehensions are going into the direction of Functional Programming (FP).
Which exactly deals with the problem of not to apply mutation (like the b.append() or b.clear() methods – because – as you have seen: your case is a demonstration of how easily a bug is generated when using mutation. – and would contribute to the discussion – why FP – while it at the first sight looks brain-unfriendly – is
actually the more brain-friendly way to program.

List comprehensions are the Pythonic form of "map" – and if you use a "if" inside list comprehensions – this is the Pythonic equivalent to "filter" which FP people know like a second brain for breathing.

Answered By: Gwang-Jin Kim
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.