Calculating time difference with different base years in pandas
Question:
Let’s assumme I have the following data:
d = {'origin': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'], 'destination': ['b', 'b', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c'], 'year': [2000, 2001, 2002, 2003, 2004, 2005, 2000, 2001, 2002, 2003, 2004, 2005], 'value': [10, 17, 22, 7, 8, 14, 10, 2, 5, 7, 78, 23] }
data_frame = pd.DataFrame(data=d)
data_frame.set_index(['origin', 'destination'], inplace=True)
data_frame
What I want to achieve is the following. I want to calculate the time differences w.r.t column value
for each origin-destination pair (given as an index) for two cases.
In the first case, I want the year 2000 as a base. So that the corresponding value will be subtracted from the values in upcoming years (including 2000). Once the year reaches 2003, the base year will become 2003, and it continues to subtract.
If it is a little bit unclear, here is the final dataset I want to achieve
d = {'origin': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'], 'destination': ['b', 'b', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c'], 'year': [2000, 2001, 2002, 2003, 2004, 2005, 2000, 2001, 2002, 2003, 2004, 2005], 'value': [10, 17, 22, 7, 8, 14, 10, 2, 5, 7, 78, 23], 'diff': [0, 7, 12, 0, 1, 7, 0, -8, -5, 0, 71, 16], }
data_frame = pd.DataFrame(data=d)
data_frame.set_index(['origin', 'destination'], inplace=True)
data_frame
For each origin-destination pair, the difference is calculated having the base year 2000 and then switches to 2003.
Thanks for your help
Answers:
You can create Series
for subtracting by replace value
to missing values if no match 2000,2003
and then forward filling NaN
s per groups:
s = (data_frame['value'].where(data_frame['year'].isin([2000, 2003]))
.groupby(level=[0,1])
.ffill())
data_frame['diff'] = data_frame['value'].sub(s)
შეგიძლია გამოიყენო:
def calc(data_frame):
if data_frame['year'] < 2003:
x = data_frame['value'] - 10
return x
else:
y = data_frame['value'] - 7
return y
data_frame['diff'] = data_frame.apply(calc, axis=1)
Let’s assumme I have the following data:
d = {'origin': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'], 'destination': ['b', 'b', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c'], 'year': [2000, 2001, 2002, 2003, 2004, 2005, 2000, 2001, 2002, 2003, 2004, 2005], 'value': [10, 17, 22, 7, 8, 14, 10, 2, 5, 7, 78, 23] }
data_frame = pd.DataFrame(data=d)
data_frame.set_index(['origin', 'destination'], inplace=True)
data_frame
What I want to achieve is the following. I want to calculate the time differences w.r.t column value
for each origin-destination pair (given as an index) for two cases.
In the first case, I want the year 2000 as a base. So that the corresponding value will be subtracted from the values in upcoming years (including 2000). Once the year reaches 2003, the base year will become 2003, and it continues to subtract.
If it is a little bit unclear, here is the final dataset I want to achieve
d = {'origin': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'], 'destination': ['b', 'b', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c'], 'year': [2000, 2001, 2002, 2003, 2004, 2005, 2000, 2001, 2002, 2003, 2004, 2005], 'value': [10, 17, 22, 7, 8, 14, 10, 2, 5, 7, 78, 23], 'diff': [0, 7, 12, 0, 1, 7, 0, -8, -5, 0, 71, 16], }
data_frame = pd.DataFrame(data=d)
data_frame.set_index(['origin', 'destination'], inplace=True)
data_frame
For each origin-destination pair, the difference is calculated having the base year 2000 and then switches to 2003.
Thanks for your help
You can create Series
for subtracting by replace value
to missing values if no match 2000,2003
and then forward filling NaN
s per groups:
s = (data_frame['value'].where(data_frame['year'].isin([2000, 2003]))
.groupby(level=[0,1])
.ffill())
data_frame['diff'] = data_frame['value'].sub(s)
შეგიძლია გამოიყენო:
def calc(data_frame):
if data_frame['year'] < 2003:
x = data_frame['value'] - 10
return x
else:
y = data_frame['value'] - 7
return y
data_frame['diff'] = data_frame.apply(calc, axis=1)