Calculating time difference with different base years in pandas

Question:

Let’s assumme I have the following data:

d = {'origin': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'], 'destination': ['b', 'b', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c'], 'year': [2000, 2001, 2002, 2003, 2004, 2005, 2000, 2001, 2002, 2003, 2004, 2005], 'value': [10, 17, 22, 7, 8, 14, 10, 2, 5, 7, 78, 23] }
data_frame  = pd.DataFrame(data=d)
data_frame.set_index(['origin', 'destination'], inplace=True)
data_frame

What I want to achieve is the following. I want to calculate the time differences w.r.t column value for each origin-destination pair (given as an index) for two cases.

In the first case, I want the year 2000 as a base. So that the corresponding value will be subtracted from the values in upcoming years (including 2000). Once the year reaches 2003, the base year will become 2003, and it continues to subtract.

If it is a little bit unclear, here is the final dataset I want to achieve

d = {'origin': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'], 'destination': ['b', 'b', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c'], 'year': [2000, 2001, 2002, 2003, 2004, 2005, 2000, 2001, 2002, 2003, 2004, 2005], 'value': [10, 17, 22, 7, 8, 14, 10, 2, 5, 7, 78, 23],  'diff': [0, 7, 12, 0, 1, 7, 0, -8, -5, 0, 71, 16], }
data_frame  = pd.DataFrame(data=d)
data_frame.set_index(['origin', 'destination'], inplace=True)
data_frame

For each origin-destination pair, the difference is calculated having the base year 2000 and then switches to 2003.

Thanks for your help

Asked By: Avto Abashishvili

||

Answers:

You can create Series for subtracting by replace value to missing values if no match 2000,2003 and then forward filling NaNs per groups:

s = (data_frame['value'].where(data_frame['year'].isin([2000, 2003]))
                        .groupby(level=[0,1])
                        .ffill())
data_frame['diff'] = data_frame['value'].sub(s)
Answered By: jezrael

შეგიძლია გამოიყენო:

def calc(data_frame):
  if data_frame['year'] < 2003:
    x = data_frame['value'] - 10
    return x
  else:
    y = data_frame['value'] - 7
    return y



data_frame['diff'] = data_frame.apply(calc, axis=1)
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.