How do I subtract the previous row from the current row in a pandas dataframe and apply it to every row; without using a loop?
Question:
I am using Python3.5 and I am working with pandas. I have loaded stock data from yahoo finance and have saved the files to csv. My DataFrames load this data from the csv. This is a copy of the ten rows of the csv file that is my DataFrame
Date Open High Low Close Volume Adj Close
1990-04-12 26.875000 26.875000 26.625 26.625 6100 250.576036
1990-04-16 26.500000 26.750000 26.375 26.750 500 251.752449
1990-04-17 26.750000 26.875000 26.750 26.875 2300 252.928863
1990-04-18 26.875000 26.875000 26.500 26.625 3500 250.576036
1990-04-19 26.500000 26.750000 26.500 26.750 700 251.752449
1990-04-20 26.750000 26.875000 26.750 26.875 2100 252.928863
1990-04-23 26.875000 26.875000 26.750 26.875 700 252.928863
1990-04-24 27.000000 27.000000 26.000 26.000 2400 244.693970
1990-04-25 25.250000 25.250000 24.875 25.125 9300 236.459076
1990-04-26 25.000000 25.250000 24.750 25.000 1200 235.282663
I know that I can use iloc, loc, ix but these values that I index will only give my specific rows and columns and will not perform the operation on every row.
For example: Row one of the data in the open column has a value of 26.875 and the row below it has 26.50. The price dropped .375 cents. I want to be able to capture the % of Increase or Decrease from the previous day so to finish this example .375 divided by 26.875 = 1.4% decrease from one day to the next. I want to be able to run this calculation on every row so I know how much it has increased or decreased from the previous day. The index functions I have tried but they are absolute, and I don’t want to use a loop. Is there a way I can do this with the ix, iloc, loc or another function?
Answers:
you can use pct_change() or/and diff() methods
Demo:
In [138]: df.Close.pct_change() * 100
Out[138]:
0 NaN
1 0.469484
2 0.467290
3 -0.930233
4 0.469484
5 0.467290
6 0.000000
7 -3.255814
8 -3.365385
9 -0.497512
Name: Close, dtype: float64
In [139]: df.Close.diff()
Out[139]:
0 NaN
1 0.125
2 0.125
3 -0.250
4 0.125
5 0.125
6 0.000
7 -0.875
8 -0.875
9 -0.125
Name: Close, dtype: float64
MaxU solutions suits in your case. If you want to perform more complex computations based on your previous rows you should use shift
I am using Python3.5 and I am working with pandas. I have loaded stock data from yahoo finance and have saved the files to csv. My DataFrames load this data from the csv. This is a copy of the ten rows of the csv file that is my DataFrame
Date Open High Low Close Volume Adj Close
1990-04-12 26.875000 26.875000 26.625 26.625 6100 250.576036
1990-04-16 26.500000 26.750000 26.375 26.750 500 251.752449
1990-04-17 26.750000 26.875000 26.750 26.875 2300 252.928863
1990-04-18 26.875000 26.875000 26.500 26.625 3500 250.576036
1990-04-19 26.500000 26.750000 26.500 26.750 700 251.752449
1990-04-20 26.750000 26.875000 26.750 26.875 2100 252.928863
1990-04-23 26.875000 26.875000 26.750 26.875 700 252.928863
1990-04-24 27.000000 27.000000 26.000 26.000 2400 244.693970
1990-04-25 25.250000 25.250000 24.875 25.125 9300 236.459076
1990-04-26 25.000000 25.250000 24.750 25.000 1200 235.282663
I know that I can use iloc, loc, ix but these values that I index will only give my specific rows and columns and will not perform the operation on every row.
For example: Row one of the data in the open column has a value of 26.875 and the row below it has 26.50. The price dropped .375 cents. I want to be able to capture the % of Increase or Decrease from the previous day so to finish this example .375 divided by 26.875 = 1.4% decrease from one day to the next. I want to be able to run this calculation on every row so I know how much it has increased or decreased from the previous day. The index functions I have tried but they are absolute, and I don’t want to use a loop. Is there a way I can do this with the ix, iloc, loc or another function?
you can use pct_change() or/and diff() methods
Demo:
In [138]: df.Close.pct_change() * 100
Out[138]:
0 NaN
1 0.469484
2 0.467290
3 -0.930233
4 0.469484
5 0.467290
6 0.000000
7 -3.255814
8 -3.365385
9 -0.497512
Name: Close, dtype: float64
In [139]: df.Close.diff()
Out[139]:
0 NaN
1 0.125
2 0.125
3 -0.250
4 0.125
5 0.125
6 0.000
7 -0.875
8 -0.875
9 -0.125
Name: Close, dtype: float64
MaxU solutions suits in your case. If you want to perform more complex computations based on your previous rows you should use shift