How can I apply maths to a pandas dataframe comparing 2 specific row and column indexes
Question:
I have this dataframe
import pandas as pd
import numpy as np
np.random.seed(2022)
# make example data
close = np.sin(range(610)) + 10
high = close + np.random.rand(*close.shape)
open = high - np.random.rand(*close.shape)
low = high - 3
close[2] += 100
dates = pd.date_range(end='2022-06-30', periods=len(close))
# insert into pd.dataframe
df = pd.DataFrame(index=dates, data=np.array([open, high, low, close]).T, columns=['Open', 'High', 'Low', 'Close'])
print(df)
Output
Open High Low Close
2020-10-29 9.557631 10.009359 7.009359 10.000000
2020-10-30 10.794789 11.340529 8.340529 10.841471
2020-10-31 10.631242 11.022681 8.022681 110.909297
2020-11-01 9.639562 10.191094 7.191094 10.141120
2020-11-02 9.835697 9.928605 6.928605 9.243198
... ... ... ... ...
2022-06-26 10.738942 11.167593 8.167593 10.970521
2022-06-27 10.031187 10.868859 7.868859 10.321565
2022-06-28 9.991932 10.271633 7.271633 9.376964
2022-06-29 9.069759 9.684232 6.684232 9.005179
2022-06-30 9.479291 10.300242 7.300242 9.548028
Edit:
I now know many different ways to achieve this however I am re-writing the question so it is more clear for future readers what the original goal was.
The goal here is to compare a specific value in the dataframe, to another value in the dataframe.
For example:
Check when the value at ‘open’ column is less than the value at close column.
One solution for this is using itertuples, I have written an answer below explaining the solution
Answers:
The first step you want to do can be done by df.loc["A", "High"] > df.loc["C", "Low"]
. To apply this to all rows you could do something like below:
for i in range(2, len(df)):
print(df["High"][i-2] > df["Low"][i])
I’m sure there are better ways to do it, but this would work.
you can use shift operation on column to shift the rows up/down
`df['High'] > df['Low'].shift(-2)`
To elaborate what’s going on, run below commands
df = pd.DataFrame(np.random.randn(5,4), list('ABCDE'), ['Open', 'High', 'Low', 'Close'])
df['Low_shiftup'] = df['Low'].shift(-2)
df.head()
df['High'] > df['Low_shiftup']
As I explained in the question I have now found multiple solutions for this problem. One being itertuples.
Here is how to use itertuples to solve the problem.
First, create the dataframe
import pandas as pd
import numpy as np
np.random.seed(2022)
# make example data
close = np.sin(range(610)) + 10
high = close + np.random.rand(*close.shape)
open = high - np.random.rand(*close.shape)
low = high - 3
close[2] += 100
dates = pd.date_range(end='2022-06-30', periods=len(close))
# insert into pd.dataframe
df = pd.DataFrame(index=dates, data=np.array([open, high, low, close]).T, columns=['Open', 'High', 'Low', 'Close'])
print(df)
Now we use itertuples to iterate over the rows of the dataframe
for row in df.itertuples():
o = row.Open
for r in df.itertuples():
c = r.Close
if o < c:
print('O is less than C')
else:
print('O is greater than C')
This will find all instances of when the open price is less than the close price
This can be expanded on to check other conditions within the same loop just by adding more variables and more if statements, and also using enumerate to check positioning
For example:
for idx, row in enumerate(df.itertuples()):
o = row.Open
h = row.High
for i, r in enumerate(df.itertuples()):
c = r.Close
l = r.Low
if (i > idx) & ((h - 2) > l):
if o < c:
print('O is less than C')
else:
print('O is greater than C')
else:
continue
The above code uses enumerate to add a counter to each loop. The additional if statement will only check if ‘o < c’ in rows which the loop counter for ‘c’ is greater than the loop counter for ‘o’.
As you can see any value in the dataframe can be compared to another using the correct if statements.
I have this dataframe
import pandas as pd
import numpy as np
np.random.seed(2022)
# make example data
close = np.sin(range(610)) + 10
high = close + np.random.rand(*close.shape)
open = high - np.random.rand(*close.shape)
low = high - 3
close[2] += 100
dates = pd.date_range(end='2022-06-30', periods=len(close))
# insert into pd.dataframe
df = pd.DataFrame(index=dates, data=np.array([open, high, low, close]).T, columns=['Open', 'High', 'Low', 'Close'])
print(df)
Output
Open High Low Close
2020-10-29 9.557631 10.009359 7.009359 10.000000
2020-10-30 10.794789 11.340529 8.340529 10.841471
2020-10-31 10.631242 11.022681 8.022681 110.909297
2020-11-01 9.639562 10.191094 7.191094 10.141120
2020-11-02 9.835697 9.928605 6.928605 9.243198
... ... ... ... ...
2022-06-26 10.738942 11.167593 8.167593 10.970521
2022-06-27 10.031187 10.868859 7.868859 10.321565
2022-06-28 9.991932 10.271633 7.271633 9.376964
2022-06-29 9.069759 9.684232 6.684232 9.005179
2022-06-30 9.479291 10.300242 7.300242 9.548028
Edit:
I now know many different ways to achieve this however I am re-writing the question so it is more clear for future readers what the original goal was.
The goal here is to compare a specific value in the dataframe, to another value in the dataframe.
For example:
Check when the value at ‘open’ column is less than the value at close column.
One solution for this is using itertuples, I have written an answer below explaining the solution
The first step you want to do can be done by df.loc["A", "High"] > df.loc["C", "Low"]
. To apply this to all rows you could do something like below:
for i in range(2, len(df)):
print(df["High"][i-2] > df["Low"][i])
I’m sure there are better ways to do it, but this would work.
you can use shift operation on column to shift the rows up/down
`df['High'] > df['Low'].shift(-2)`
To elaborate what’s going on, run below commands
df = pd.DataFrame(np.random.randn(5,4), list('ABCDE'), ['Open', 'High', 'Low', 'Close'])
df['Low_shiftup'] = df['Low'].shift(-2)
df.head()
df['High'] > df['Low_shiftup']
As I explained in the question I have now found multiple solutions for this problem. One being itertuples.
Here is how to use itertuples to solve the problem.
First, create the dataframe
import pandas as pd
import numpy as np
np.random.seed(2022)
# make example data
close = np.sin(range(610)) + 10
high = close + np.random.rand(*close.shape)
open = high - np.random.rand(*close.shape)
low = high - 3
close[2] += 100
dates = pd.date_range(end='2022-06-30', periods=len(close))
# insert into pd.dataframe
df = pd.DataFrame(index=dates, data=np.array([open, high, low, close]).T, columns=['Open', 'High', 'Low', 'Close'])
print(df)
Now we use itertuples to iterate over the rows of the dataframe
for row in df.itertuples():
o = row.Open
for r in df.itertuples():
c = r.Close
if o < c:
print('O is less than C')
else:
print('O is greater than C')
This will find all instances of when the open price is less than the close price
This can be expanded on to check other conditions within the same loop just by adding more variables and more if statements, and also using enumerate to check positioning
For example:
for idx, row in enumerate(df.itertuples()):
o = row.Open
h = row.High
for i, r in enumerate(df.itertuples()):
c = r.Close
l = r.Low
if (i > idx) & ((h - 2) > l):
if o < c:
print('O is less than C')
else:
print('O is greater than C')
else:
continue
The above code uses enumerate to add a counter to each loop. The additional if statement will only check if ‘o < c’ in rows which the loop counter for ‘c’ is greater than the loop counter for ‘o’.
As you can see any value in the dataframe can be compared to another using the correct if statements.