How to calculate differences between consecutive rows in pandas data frame?
Question:
I’ve got a data frame, df
, with three columns: count_a
, count_b
and date
; the counts are floats, and the dates are consecutive days in 2015.
I’m trying to figure out the difference between each day’s counts in both the count_a
and count_b
columns — meaning, I’m trying to calculate the difference between each row and the preceding row for both of those columns. I’ve set the date as the index, but am having trouble figuring out how to do this; there were a couple of hints about using pd.Series
and pd.DataFrame.diff
but I haven’t had any luck finding an applicable answer or set of instructions.
I’m a bit stuck, and would appreciate some guidance here.
Here’s what my data frame looks like:
df=pd.Dataframe({'count_a': {Timestamp('2015-01-01 00:00:00'): 34175.0,
Timestamp('2015-01-02 00:00:00'): 72640.0,
Timestamp('2015-01-03 00:00:00'): 109354.0,
Timestamp('2015-01-04 00:00:00'): 144491.0,
Timestamp('2015-01-05 00:00:00'): 180355.0,
Timestamp('2015-01-06 00:00:00'): 214615.0,
Timestamp('2015-01-07 00:00:00'): 250096.0,
Timestamp('2015-01-08 00:00:00'): 287880.0,
Timestamp('2015-01-09 00:00:00'): 332528.0,
Timestamp('2015-01-10 00:00:00'): 381460.0,
Timestamp('2015-01-11 00:00:00'): 422981.0,
Timestamp('2015-01-12 00:00:00'): 463539.0,
Timestamp('2015-01-13 00:00:00'): 505395.0,
Timestamp('2015-01-14 00:00:00'): 549027.0,
Timestamp('2015-01-15 00:00:00'): 595377.0,
Timestamp('2015-01-16 00:00:00'): 649043.0,
Timestamp('2015-01-17 00:00:00'): 707727.0,
Timestamp('2015-01-18 00:00:00'): 761287.0,
Timestamp('2015-01-19 00:00:00'): 814372.0,
Timestamp('2015-01-20 00:00:00'): 867096.0,
Timestamp('2015-01-21 00:00:00'): 920838.0,
Timestamp('2015-01-22 00:00:00'): 983405.0,
Timestamp('2015-01-23 00:00:00'): 1067243.0,
Timestamp('2015-01-24 00:00:00'): 1164421.0,
Timestamp('2015-01-25 00:00:00'): 1252178.0,
Timestamp('2015-01-26 00:00:00'): 1341484.0,
Timestamp('2015-01-27 00:00:00'): 1427600.0,
Timestamp('2015-01-28 00:00:00'): 1511549.0,
Timestamp('2015-01-29 00:00:00'): 1594846.0,
Timestamp('2015-01-30 00:00:00'): 1694226.0,
Timestamp('2015-01-31 00:00:00'): 1806727.0,
Timestamp('2015-02-01 00:00:00'): 1899880.0,
Timestamp('2015-02-02 00:00:00'): 1987978.0,
Timestamp('2015-02-03 00:00:00'): 2080338.0,
Timestamp('2015-02-04 00:00:00'): 2175775.0,
Timestamp('2015-02-05 00:00:00'): 2279525.0,
Timestamp('2015-02-06 00:00:00'): 2403306.0,
Timestamp('2015-02-07 00:00:00'): 2545696.0,
Timestamp('2015-02-08 00:00:00'): 2672464.0,
Timestamp('2015-02-09 00:00:00'): 2794788.0},
'count_b': {Timestamp('2015-01-01 00:00:00'): nan,
Timestamp('2015-01-02 00:00:00'): nan,
Timestamp('2015-01-03 00:00:00'): nan,
Timestamp('2015-01-04 00:00:00'): nan,
Timestamp('2015-01-05 00:00:00'): nan,
Timestamp('2015-01-06 00:00:00'): nan,
Timestamp('2015-01-07 00:00:00'): nan,
Timestamp('2015-01-08 00:00:00'): nan,
Timestamp('2015-01-09 00:00:00'): nan,
Timestamp('2015-01-10 00:00:00'): nan,
Timestamp('2015-01-11 00:00:00'): nan,
Timestamp('2015-01-12 00:00:00'): nan,
Timestamp('2015-01-13 00:00:00'): nan,
Timestamp('2015-01-14 00:00:00'): nan,
Timestamp('2015-01-15 00:00:00'): nan,
Timestamp('2015-01-16 00:00:00'): nan,
Timestamp('2015-01-17 00:00:00'): nan,
Timestamp('2015-01-18 00:00:00'): nan,
Timestamp('2015-01-19 00:00:00'): nan,
Timestamp('2015-01-20 00:00:00'): nan,
Timestamp('2015-01-21 00:00:00'): nan,
Timestamp('2015-01-22 00:00:00'): nan,
Timestamp('2015-01-23 00:00:00'): nan,
Timestamp('2015-01-24 00:00:00'): 71.0,
Timestamp('2015-01-25 00:00:00'): 150.0,
Timestamp('2015-01-26 00:00:00'): 236.0,
Timestamp('2015-01-27 00:00:00'): 345.0,
Timestamp('2015-01-28 00:00:00'): 1239.0,
Timestamp('2015-01-29 00:00:00'): 2228.0,
Timestamp('2015-01-30 00:00:00'): 7094.0,
Timestamp('2015-01-31 00:00:00'): 16593.0,
Timestamp('2015-02-01 00:00:00'): 27190.0,
Timestamp('2015-02-02 00:00:00'): 37519.0,
Timestamp('2015-02-03 00:00:00'): 49003.0,
Timestamp('2015-02-04 00:00:00'): 63323.0,
Timestamp('2015-02-05 00:00:00'): 79846.0,
Timestamp('2015-02-06 00:00:00'): 101568.0,
Timestamp('2015-02-07 00:00:00'): 127120.0,
Timestamp('2015-02-08 00:00:00'): 149955.0,
Timestamp('2015-02-09 00:00:00'): 171440.0}})
Answers:
You can using the .rolling_apply(…)
method:
diffs_a = pd.rolling_apply(df['count_a'], 2, lambda x: x[0] - x[1])
Alternatively, if it’s easier, you can operate on the arrays directly:
count_a_vals = df['count_a'].values
diffs_a = count_a_vals[:-1] - count_a_vals[1:]
diff
should give the desired result:
>>> df.diff()
count_a count_b
2015-01-01 NaN NaN
2015-01-02 38465 NaN
2015-01-03 36714 NaN
2015-01-04 35137 NaN
2015-01-05 35864 NaN
....
2015-02-07 142390 25552
2015-02-08 126768 22835
2015-02-09 122324 21485
I’ve got a data frame, df
, with three columns: count_a
, count_b
and date
; the counts are floats, and the dates are consecutive days in 2015.
I’m trying to figure out the difference between each day’s counts in both the count_a
and count_b
columns — meaning, I’m trying to calculate the difference between each row and the preceding row for both of those columns. I’ve set the date as the index, but am having trouble figuring out how to do this; there were a couple of hints about using pd.Series
and pd.DataFrame.diff
but I haven’t had any luck finding an applicable answer or set of instructions.
I’m a bit stuck, and would appreciate some guidance here.
Here’s what my data frame looks like:
df=pd.Dataframe({'count_a': {Timestamp('2015-01-01 00:00:00'): 34175.0,
Timestamp('2015-01-02 00:00:00'): 72640.0,
Timestamp('2015-01-03 00:00:00'): 109354.0,
Timestamp('2015-01-04 00:00:00'): 144491.0,
Timestamp('2015-01-05 00:00:00'): 180355.0,
Timestamp('2015-01-06 00:00:00'): 214615.0,
Timestamp('2015-01-07 00:00:00'): 250096.0,
Timestamp('2015-01-08 00:00:00'): 287880.0,
Timestamp('2015-01-09 00:00:00'): 332528.0,
Timestamp('2015-01-10 00:00:00'): 381460.0,
Timestamp('2015-01-11 00:00:00'): 422981.0,
Timestamp('2015-01-12 00:00:00'): 463539.0,
Timestamp('2015-01-13 00:00:00'): 505395.0,
Timestamp('2015-01-14 00:00:00'): 549027.0,
Timestamp('2015-01-15 00:00:00'): 595377.0,
Timestamp('2015-01-16 00:00:00'): 649043.0,
Timestamp('2015-01-17 00:00:00'): 707727.0,
Timestamp('2015-01-18 00:00:00'): 761287.0,
Timestamp('2015-01-19 00:00:00'): 814372.0,
Timestamp('2015-01-20 00:00:00'): 867096.0,
Timestamp('2015-01-21 00:00:00'): 920838.0,
Timestamp('2015-01-22 00:00:00'): 983405.0,
Timestamp('2015-01-23 00:00:00'): 1067243.0,
Timestamp('2015-01-24 00:00:00'): 1164421.0,
Timestamp('2015-01-25 00:00:00'): 1252178.0,
Timestamp('2015-01-26 00:00:00'): 1341484.0,
Timestamp('2015-01-27 00:00:00'): 1427600.0,
Timestamp('2015-01-28 00:00:00'): 1511549.0,
Timestamp('2015-01-29 00:00:00'): 1594846.0,
Timestamp('2015-01-30 00:00:00'): 1694226.0,
Timestamp('2015-01-31 00:00:00'): 1806727.0,
Timestamp('2015-02-01 00:00:00'): 1899880.0,
Timestamp('2015-02-02 00:00:00'): 1987978.0,
Timestamp('2015-02-03 00:00:00'): 2080338.0,
Timestamp('2015-02-04 00:00:00'): 2175775.0,
Timestamp('2015-02-05 00:00:00'): 2279525.0,
Timestamp('2015-02-06 00:00:00'): 2403306.0,
Timestamp('2015-02-07 00:00:00'): 2545696.0,
Timestamp('2015-02-08 00:00:00'): 2672464.0,
Timestamp('2015-02-09 00:00:00'): 2794788.0},
'count_b': {Timestamp('2015-01-01 00:00:00'): nan,
Timestamp('2015-01-02 00:00:00'): nan,
Timestamp('2015-01-03 00:00:00'): nan,
Timestamp('2015-01-04 00:00:00'): nan,
Timestamp('2015-01-05 00:00:00'): nan,
Timestamp('2015-01-06 00:00:00'): nan,
Timestamp('2015-01-07 00:00:00'): nan,
Timestamp('2015-01-08 00:00:00'): nan,
Timestamp('2015-01-09 00:00:00'): nan,
Timestamp('2015-01-10 00:00:00'): nan,
Timestamp('2015-01-11 00:00:00'): nan,
Timestamp('2015-01-12 00:00:00'): nan,
Timestamp('2015-01-13 00:00:00'): nan,
Timestamp('2015-01-14 00:00:00'): nan,
Timestamp('2015-01-15 00:00:00'): nan,
Timestamp('2015-01-16 00:00:00'): nan,
Timestamp('2015-01-17 00:00:00'): nan,
Timestamp('2015-01-18 00:00:00'): nan,
Timestamp('2015-01-19 00:00:00'): nan,
Timestamp('2015-01-20 00:00:00'): nan,
Timestamp('2015-01-21 00:00:00'): nan,
Timestamp('2015-01-22 00:00:00'): nan,
Timestamp('2015-01-23 00:00:00'): nan,
Timestamp('2015-01-24 00:00:00'): 71.0,
Timestamp('2015-01-25 00:00:00'): 150.0,
Timestamp('2015-01-26 00:00:00'): 236.0,
Timestamp('2015-01-27 00:00:00'): 345.0,
Timestamp('2015-01-28 00:00:00'): 1239.0,
Timestamp('2015-01-29 00:00:00'): 2228.0,
Timestamp('2015-01-30 00:00:00'): 7094.0,
Timestamp('2015-01-31 00:00:00'): 16593.0,
Timestamp('2015-02-01 00:00:00'): 27190.0,
Timestamp('2015-02-02 00:00:00'): 37519.0,
Timestamp('2015-02-03 00:00:00'): 49003.0,
Timestamp('2015-02-04 00:00:00'): 63323.0,
Timestamp('2015-02-05 00:00:00'): 79846.0,
Timestamp('2015-02-06 00:00:00'): 101568.0,
Timestamp('2015-02-07 00:00:00'): 127120.0,
Timestamp('2015-02-08 00:00:00'): 149955.0,
Timestamp('2015-02-09 00:00:00'): 171440.0}})
You can using the .rolling_apply(…)
method:
diffs_a = pd.rolling_apply(df['count_a'], 2, lambda x: x[0] - x[1])
Alternatively, if it’s easier, you can operate on the arrays directly:
count_a_vals = df['count_a'].values
diffs_a = count_a_vals[:-1] - count_a_vals[1:]
diff
should give the desired result:
>>> df.diff()
count_a count_b
2015-01-01 NaN NaN
2015-01-02 38465 NaN
2015-01-03 36714 NaN
2015-01-04 35137 NaN
2015-01-05 35864 NaN
....
2015-02-07 142390 25552
2015-02-08 126768 22835
2015-02-09 122324 21485