How can I apply maths to a pandas dataframe comparing 2 specific row and column indexes

Question:

I have this dataframe

import pandas as pd
import numpy as np
np.random.seed(2022)

# make example data
close = np.sin(range(610)) + 10
high = close + np.random.rand(*close.shape)
open = high - np.random.rand(*close.shape)
low = high - 3
close[2] += 100  
dates = pd.date_range(end='2022-06-30', periods=len(close))

# insert into pd.dataframe
df = pd.DataFrame(index=dates, data=np.array([open, high, low, close]).T, columns=['Open', 'High', 'Low', 'Close'])
print(df)

Output

                 Open       High       Low       Close
2020-10-29   9.557631  10.009359  7.009359   10.000000
2020-10-30  10.794789  11.340529  8.340529   10.841471
2020-10-31  10.631242  11.022681  8.022681  110.909297
2020-11-01   9.639562  10.191094  7.191094   10.141120
2020-11-02   9.835697   9.928605  6.928605    9.243198
...               ...        ...       ...         ...
2022-06-26  10.738942  11.167593  8.167593   10.970521
2022-06-27  10.031187  10.868859  7.868859   10.321565
2022-06-28   9.991932  10.271633  7.271633    9.376964
2022-06-29   9.069759   9.684232  6.684232    9.005179
2022-06-30   9.479291  10.300242  7.300242    9.548028

Edit:
I now know many different ways to achieve this however I am re-writing the question so it is more clear for future readers what the original goal was.

The goal here is to compare a specific value in the dataframe, to another value in the dataframe.

For example:
Check when the value at ‘open’ column is less than the value at close column.

One solution for this is using itertuples, I have written an answer below explaining the solution

Asked By: Callum

||

Answers:

The first step you want to do can be done by df.loc["A", "High"] > df.loc["C", "Low"]. To apply this to all rows you could do something like below:

for i in range(2, len(df)):
    print(df["High"][i-2] > df["Low"][i])

I’m sure there are better ways to do it, but this would work.

Answered By: Markus Schmidgall

you can use shift operation on column to shift the rows up/down

`df['High'] > df['Low'].shift(-2)`

To elaborate what’s going on, run below commands

df = pd.DataFrame(np.random.randn(5,4), list('ABCDE'), ['Open', 'High', 'Low', 'Close'])
df['Low_shiftup'] = df['Low'].shift(-2)
df.head()
df['High'] > df['Low_shiftup']
Answered By: charitha maduranga

As I explained in the question I have now found multiple solutions for this problem. One being itertuples.

Here is how to use itertuples to solve the problem.

First, create the dataframe

import pandas as pd
import numpy as np
np.random.seed(2022)

# make example data
close = np.sin(range(610)) + 10
high = close + np.random.rand(*close.shape)
open = high - np.random.rand(*close.shape)
low = high - 3
close[2] += 100
dates = pd.date_range(end='2022-06-30', periods=len(close))

# insert into pd.dataframe
df = pd.DataFrame(index=dates, data=np.array([open, high, low, close]).T, columns=['Open', 'High', 'Low', 'Close'])
print(df)

Now we use itertuples to iterate over the rows of the dataframe

for row in df.itertuples():
    o = row.Open
    for r in df.itertuples():
        c = r.Close
        if o < c:
            print('O is less than C')
        else:
            print('O is greater than C')

This will find all instances of when the open price is less than the close price

This can be expanded on to check other conditions within the same loop just by adding more variables and more if statements, and also using enumerate to check positioning

For example:

for idx, row in enumerate(df.itertuples()):
    o = row.Open
    h = row.High
    for i, r in enumerate(df.itertuples()):
        c = r.Close
        l = r.Low
        if (i > idx) & ((h - 2) > l):
            if o < c:
                print('O is less than C')
            else:
                print('O is greater than C')
        else:
            continue

The above code uses enumerate to add a counter to each loop. The additional if statement will only check if ‘o < c’ in rows which the loop counter for ‘c’ is greater than the loop counter for ‘o’.

As you can see any value in the dataframe can be compared to another using the correct if statements.

Answered By: Callum
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.