How do you perform conditional operations on different elements in a Pandas DataFrame?

Question

Let’s say I have a Pandas Dataframe of the price and stock history of a product at 10 different points in time:

df = pd.DataFrame(index=[np.arange(10)])
df['price'] = 10,10,11,15,20,10,10,11,15,20
df['stock'] = 30,20,13,8,4,30,20,13,8,4
df

  price stock
0   10  30
1   10  20
2   11  13
3   15  8
4   20  4
5   10  30
6   10  20
7   11  13
8   15  8
9   20  4

How do I perform operations between specific rows that meet certain criteria?

In my example row 0 and row 5 meet the criteria "stock over 25" and row 4 and row 9 meet the criteria "stock under 5".

I would like to calculate:

df['price'][4] - df['price'][0] and
df['price'][9] - df['price'][5]

but not
df['price'][9] - df['price'][0] or
df['price'][4] - df['price'][5].

In other words, I would like to calculate the price change between the most recent event where stock was under 5 vs the most recent event where stock was over 25; over the whole series.

Of course, I would like to do this over larger datasets where picking them manually is not good.

Asked By: florianc63

||

Source

Answer 1

In order to give a better explanation, will separate the approach by creating two different functions:

The first one will be the event detection, let’s call it detect_event.
The second one will calculate the the price between the current event and the previous one, in the list generated by the first function. We will call it calculate_price_change.

Starting with the first function, here it is key to understand very well the goals we want to reach or the constraints/conditions we want to satisfy.

Will leave two, of more, potential options, given the various interpretations of the question:

A. The initial is what I could get from my initial understanding

B. The second part will be one of the interpretations one could get from @Iyar Lyn comment (I can see more interpretations, but won’t consider in this answer as the approach will be similar).

Within option A, we will create a function to detect where a stock is under 5 or 25

def detect_event(df):
    # Create a list of the indexes of the events where stock was under 5 or over 25
    events = []
    # Loop through the dataframe
    for i in range(len(df)):
        # If stock is under 5, add the index to the list
        if df['stock'][i] < 5:
            events.append(i)
        # If stock is over 25, add the index to the list
        elif df['stock'][i] > 25:
            events.append(i)
    # Return the list of indexes of the events where stock was under 5 or over 25
    return events

The comments make it self-explanatory, but, basically, this will return a list of indexes of the rows where stock is under 5 or over 25.

With OP’s df this will return

events = detect_event(df)

[Out]:
[0, 4, 5, 9]

Within the option B, assuming one wants to know the events where the stock went from under 5 to over 25, and vice-versa, consecutively (there are more ways to interpret this), then one can use the following function

def detect_event(df):
    # Create a list of the indexes of the events where we will store the elements in the conditions
    events = []

    for i, stock in enumerate(df['stock']):
        # If the index is 0, add the index of the first event to the list of events
        if i == 0:
            events.append(i)
        # If the index is not 0, check if the stock went from over 25 to under 5 or from under 5 to over 25
        else:
            # If the stock went from over 25 to under 5, add the index of the event to the list of events
            if stock < 5 and df['stock'][i-1] > 25:
                events.append(i)
            # If the stock went from under 5 to over 25, add the index of the event to the list of events
            elif stock > 25 and df['stock'][i-1] < 5:
                events.append(i)

    # Return the list of events
    return events

With OP’s df this will return

events = detect_event(df)

[Out]:
[0, 5]

Note that 0 is the element in the first position, that we are appending by default.

As for the second function, once the conditions are well defined, meaning we know clearly what we want, and adapted the first function, detect_event, accordingly, we can now detect the changes in the prices.

In order to detect the price change between the events that satisfy the conditions we defined previously, one will use a different function: calculate_price_change.

This function will take both the dataframe df and the list events generated by the previous function, and return a list with the prices diferences.

def calculate_price_change(df, events):
    # Create a list to store the price change between the most recent event where stock was under 5 vs the most recent event where stock was over 25
    price_change = []
    # Loop through the list of indexes of the events
    for i, event in enumerate(events):
        # If the index is 0, the price change is 0
        if i == 0:
            price_change.append(0)
        # If the index is not 0, calculate the price change between the current and past events
        else:
            price_change.append(df['price'][event] - df['price'][events[i-1]])
    return price_change

Now we if one calls this last function using the df and the list created with the first function detect_event, one gets the following

price_change = calculate_price_change(df, events)

[Out]:
[0, 10, -10, 10]

Notes:

As it is, the question gives room for multiple interpretations. That’s why my initial flag for "Needs details or clarity". For the future one might want to review: How do I ask a good question?
and its hyperlinks.
I understand that sometimes we won’t be able to specify everything that we want (as we might not even know – due to various reasons), so communication is key. Therefore, appreciate Iyar Lin‘s time and contributions as they helped improve this answer.

Answered By: Gonçalo Peres

Answer 2

First, set up data frame and add some calculations:

import pandas as pd
import numpy as np

df = pd.DataFrame(index=[np.arange(10)])
df['price'] = 10,10,11,15,20,10,10,11,15,20
df['stock'] = 30,20,13,8,4,30,20,13,8,4
df['stock_under_5'] = df['stock'] < 5
df['stock_over_25'] = df['stock'] > 25
df['cum_stock_under_5'] = df['stock_under_5'].cumsum()
df['change_stock_under_5'] = df['cum_stock_under_5'].diff()
df['change_stock_under_5'].iloc[0] = df['stock_under_5'].iloc[0]*1
df['next_row_change_stock_under_5'] = df['change_stock_under_5'].shift(-1)
df['cum_stock_over_25'] = df['stock_over_25'].cumsum()
df['change_stock_over_25'] = df['cum_stock_over_25'].diff()
df['change_stock_over_25'].iloc[0] = df['stock_over_25'].iloc[0]*1
df['next_row_change_stock_over_25'] = df['change_stock_over_25'].shift(-1)
df['row'] = np.arange(df.shape[0])
df['next_row'] = df['row'].shift(-1)
df['next_row_price'] = df['price'].shift(-1)

Next we find all windows where either the stock went over 25 or below 5 by grouping over the cumulative marker of those events.

changes = (
    df.groupby(['cum_stock_under_5', 'cum_stock_over_25'])
    .agg({'row':'first', 'next_row':'last', 'change_stock_under_5':'max', 'change_stock_over_25':'max', 
          'next_row_change_stock_under_5':'max', 'next_row_change_stock_over_25':'max', 
          'price':'first', 'next_row_price':'last'})
    .assign(price_change = lambda x: x['next_row_price'] - x['price'])
    .reset_index(drop=True)
)

For each window we find what happened at the beginning of the window: if change_stock_under_5 = 1 it means the window started with the stock going under 5, if change_stock_over_25 = 1 it started with the stock going over 25.

Same for the end of the window using the columns next_row_change_stock_under_5 and next_row_change_stock_over_25

Now, we can readily extract the stock price change in rows where the stock went from being over 25 to being under 5:

from_over_to_below = changes[(changes['change_stock_over_25']==1) & (changes['next_row_change_stock_under_5']==1)]

and the other way around:

from_below_to_over = changes[(changes['change_stock_under_5']==1) & (changes['next_row_change_stock_over_25']==1)]

You can for example calculate the average price change when the stock went from over 25 to below 5:

from_over_to_below.price_change.mean()

Answered By: Iyar Lin

How do you perform conditional operations on different elements in a Pandas DataFrame?

Question:

Answers: