Why am I getting KeyErrors when accessing multiple rows in a dataframe?

Question:

I am trying to write the following condition to map MARKET_STATES.OPENING_AUCTION, however I am getting KeyError: (False, False, False) when I run the following:

market_info_df.loc[market_info_df['issue_status' == '10', 'market_phase' == '0',
                                          'trading_state' == '  '].iloc[0]] = MARKET_STATES.OPENING_AUCTION

Two questions:

  • What is it that the query does? My understanding is that using iloc[0] it finds the first instance where the three columns at the values stated. But then it maps something to MARKET_STATES.OPENING_AUCTION or?
  • How can I write a condition like the above where if the 3 columns match the values, then for the first occurrence of this, map another columns (say trade_type) to MARKET_STATES.OPENING_AUCTION?

Here is an example:

e.g:

row  issue_status  market_phase  trading_state                  trade_type
 0        20           0              '  '         ->        
 1        10           0              '  '         ->        OPENING_AUCTION
 2        20           0              '  '         ->       
 3        10           0              '  '         ->       
 4        10           0              '  '         ->       
 5        10           0              '  '         ->       
Asked By: Patrick Chong

||

Answers:

I’ll answer your second question first:

How can I write a condition like the above where if the 3 columns match the values, then for the first occurrence of this, map another columns (say trade_type) to MARKET_STATES.OPENING_AUCTION?

market_info_df = pd.DataFrame({'issue_status': {0: 20, 1: 10, 2: 20, 3: 10, 4: 10, 5: 10},
 'market_phase': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0},
 'trading_state': {0: '  ', 1: '  ', 2: '  ', 3: '  ', 4: '  ', 5: '  '}})

Any time you want to combine multiple conditions, remember that you need a logical and or logical or operation on those conditions.

When you have three conditions and you want to select rows where all of them are True, you need to and all the conditions. So in your case you’d need to do

mask = (market_info_df['issue_status'] == 10) & (market_info_df['market_phase'] == 0) & (market_info_df['trading_state'] == '  ')

To get a boolean mask that tells you which rows fulfill all those conditions. The & operator is overloaded by pandas to do the element-wise logical and for multiple Series objects. The parentheses are needed to override python’s operator precedence which puts & before ==

Note: My dataframe contains integers in the issue_status and market_phase columns, which is why I compare against integers e.g. market_info_df['issue_status'] == 10. If your columns contain strings (as your code would indicate), compare against strings (market_info_df['issue_status'] == "10")

Next, you want to set those rows for the "trade_type" column, so you’d do:

market_info_df.loc[mask, "trade_type"] = "OPENING_AUCTION"

Which sets those only rows:

   issue_status  market_phase trading_state       trade_type
0            20             0                            NaN
1            10             0                OPENING_AUCTION
2            20             0                            NaN
3            10             0                OPENING_AUCTION
4            10             0                OPENING_AUCTION
5            10             0                OPENING_AUCTION

Now, we can use idxmax() to get the index of the max value of the mask. Since this mask only contains True and False values, the first occurrence of the True value is the one that is returned. If we do this instead of setting all values of mask, we get:

market_info_df.loc[mask.idxmax(), "trade_type"] = "OPENING_AUCTION"

which gives:

   issue_status  market_phase trading_state       trade_type
0            20             0                            NaN
1            10             0                OPENING_AUCTION
2            20             0                            NaN
3            10             0                            NaN
4            10             0                            NaN
5            10             0                            NaN

As for your first question:

What is it that the query does? My understanding is that using iloc[0] it finds the first instance where the three columns at the values stated. But then it maps something to MARKET_STATES.OPENING_AUCTION or?

I have no idea what that code is supposed to do. The code inside the brackets i.e.:

'issue_status' == '10', 'market_phase' == '0', 'trading_state' == '  '

compares the string "issue_status" to the string "10" (which unsurprisingly is False) and so on, and returns a tuple that contains three Falses.

Then, market_info_df[False, False, False] tries to find the element in the dataframe at the False, False, False key, and throws the error when it fails to do so.

Answered By: Pranav Hosangadi
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.