Pandas Dataframes: comparing values of two adjacent rows and adding a column
Question:
I have a pandas Dataframe, where I have to compare values of two adjacent rows of a particular column and if they are equal then in a new column 0 needs to be added in the corresponding first row or 1 if the value in the second row is greater than the first or -1 if it’s smaller. For example, such an operation on the following Dataframe
dataframe before the operation
column1
0 2
1 2
2 4
3 4
4 5
5 3
6 2
7 1
8 55
9 3
should give the following output
column1 column2
0 2 0
1 2 1
2 4 0
3 4 1
4 5 -1
5 3 -1
6 2 -1
7 1 1
8 55 -1
9 3 0
Answers:
What we are looking for is the sign of the change. We break this up into 3 steps:
diff
will take the differences of each row with the prior row This captures the change.
x / abs(x)
is common way to capture the sign of something. We use it here when we divide d
by d.abs()
.
- finally, we have a residual
nan
in the first position due to diff
and when we divide by zero. We can fill them in with zero.
df = pd.DataFrame(dict(column1=[2, 2, 4, 4, 5, 3, 2, 1, 55, 3]))
d = df.column1.diff()
d.div(d.abs()).fillna(0)
0 0.0
1 0.0
2 1.0
3 0.0
4 1.0
5 -1.0
6 -1.0
7 -1.0
8 1.0
9 -1.0
Name: column1, dtype: float64
You can use Series.diff() and np.sign() methods:
In [27]: df['column2'] = np.sign(df.column1.diff().fillna(0))
In [28]: df
Out[28]:
column1 column2
0 2 0.0
1 2 0.0
2 4 1.0
3 4 0.0
4 5 1.0
5 3 -1.0
6 2 -1.0
7 1 -1.0
8 55 1.0
9 3 -1.0
but in order to get your desired DF (which contradicts your description), you can do the following:
In [30]: df['column3'] = np.sign(df.column1.diff().fillna(0)).shift(-1).fillna(0)
In [31]: df
Out[31]:
column1 column2 column3
0 2 0.0 0.0
1 2 0.0 1.0
2 4 1.0 0.0
3 4 0.0 1.0
4 5 1.0 -1.0
5 3 -1.0 -1.0
6 2 -1.0 -1.0
7 1 -1.0 1.0
8 55 1.0 -1.0
9 3 -1.0 0.0
I have a pandas Dataframe, where I have to compare values of two adjacent rows of a particular column and if they are equal then in a new column 0 needs to be added in the corresponding first row or 1 if the value in the second row is greater than the first or -1 if it’s smaller. For example, such an operation on the following Dataframe
dataframe before the operation
column1
0 2
1 2
2 4
3 4
4 5
5 3
6 2
7 1
8 55
9 3
should give the following output
column1 column2
0 2 0
1 2 1
2 4 0
3 4 1
4 5 -1
5 3 -1
6 2 -1
7 1 1
8 55 -1
9 3 0
What we are looking for is the sign of the change. We break this up into 3 steps:
diff
will take the differences of each row with the prior row This captures the change.x / abs(x)
is common way to capture the sign of something. We use it here when we divided
byd.abs()
.- finally, we have a residual
nan
in the first position due todiff
and when we divide by zero. We can fill them in with zero.
df = pd.DataFrame(dict(column1=[2, 2, 4, 4, 5, 3, 2, 1, 55, 3]))
d = df.column1.diff()
d.div(d.abs()).fillna(0)
0 0.0
1 0.0
2 1.0
3 0.0
4 1.0
5 -1.0
6 -1.0
7 -1.0
8 1.0
9 -1.0
Name: column1, dtype: float64
You can use Series.diff() and np.sign() methods:
In [27]: df['column2'] = np.sign(df.column1.diff().fillna(0))
In [28]: df
Out[28]:
column1 column2
0 2 0.0
1 2 0.0
2 4 1.0
3 4 0.0
4 5 1.0
5 3 -1.0
6 2 -1.0
7 1 -1.0
8 55 1.0
9 3 -1.0
but in order to get your desired DF (which contradicts your description), you can do the following:
In [30]: df['column3'] = np.sign(df.column1.diff().fillna(0)).shift(-1).fillna(0)
In [31]: df
Out[31]:
column1 column2 column3
0 2 0.0 0.0
1 2 0.0 1.0
2 4 1.0 0.0
3 4 0.0 1.0
4 5 1.0 -1.0
5 3 -1.0 -1.0
6 2 -1.0 -1.0
7 1 -1.0 1.0
8 55 1.0 -1.0
9 3 -1.0 0.0