Value not updated in for loop Python

Question:

I am testing the following simple example (see comments in the coding below for background). I have two questions. Thanks.

  • How come b in bottle is not updated even though the for loop did calculate the right value?
  • Is there an easier way to do this without using for loop? I heard that using loop can take a lot of time to run when the data is bigger than this simple example.
test = pd.DataFrame(
    [[1, 5],
     [1, 8],
     [1, 9],
     [2, 1],
     [3, 1],
     [4, 1]],
    columns=['a', 'b']
) # Original df
   
bottle = pd.DataFrame().reindex_like(test) # a blank df with the same shape
bottle['a'] = test['a'] # set 'a' in bottle to be the same in test
print(bottle)
   a   b
0  1 NaN
1  1 NaN
2  1 NaN
3  2 NaN
4  3 NaN
5  4 NaN

for index, row in bottle.iterrows():
    row['b'] = test[test['a'] == row['a']]['b'].sum()
    print(row['a'], row['b'])

1.0 22.0
1.0 22.0
1.0 22.0
2.0 1.0
3.0 1.0
4.0 1.0 # I can see for loop is doing what I need.
   
bottle
   a   b
0  1 NaN
1  1 NaN
2  1 NaN
3  2 NaN
4  3 NaN
5  4 NaN # However, 'b' in bottle is not updated by the for loop. Why? And how to fix that?

test['c'] = bottle['b'] # This is the end output I want to get, but not working due to the above. Also is there a way to achieve this without using for loop?
Asked By: LaTeXFan

||

Answers:

When you iterate over the dataframe’s rows, your row variable will be a copy of the current row, local to that for-loop’s iteration. When you go to the next iteration, that variable will be deleted, along with the changes you made to it. If you want your for loop to work, you should assign to bottle.loc[index, "b"] instead of to row["b"].

You can complete your task without a for loop by using pandas.DataFrame.groupby and transform as follows:

bottle["b"] = test.groupby("a")["b"].transform("sum")

bottle:

   a   b
0  1  22
1  1  22
2  1  22
3  2   1
4  3   1
5  4   1
Answered By: Chrysophylaxs

The value of b in bottle is not updated because you are not reassigning the value of b in bottle in the loop. Instead, you are only updating the value of b for the current row in the loop.

To fix this, you can modify the code as follows:

for index, row in bottle.iterrows():
    bottle.loc[index, 'b'] = test[test['a'] == row['a']]['b'].sum()

This will update the value of b in the bottle DataFrame for the current row in the loop.

Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.