Identify periods in a pandas series where several consecutive values are negative

Question

Given a pandas series with years as indices (in ascending order with no missing years):

growth = pd.Series({1990: 6.99, 1991: 5.53, 1992: -9.02, 1993: 1.05, 1994: 9.24, 1995: 0.16, 1996: 10.36, 1997: 2.68, 1998: 2.89, 1999: -0.82, 2000: -3.06, 2001: 1.44, 2002: -8.89, 2003: -17.0, 2004: -5.81, 2005: -5.71, 2006: -3.46, 2007: -3.65, 2008: -17.67, 2009: 12.02, 2010: 19.68, 2011: 14.19, 2012: 16.67, 2013: 1.99, 2014: 2.38, 2015: 1.78, 2016: 0.76, 2017: 4.7, 2018: 3.5, 2019: -8.1, 2020: -8.0})

I need to identify periods (start and end year) during which growth is negative for at least min_duration consecutive years.

I can do this by iterating through the series:

def get_negative_periods(s, min_duration):
    previous = 1
    negative_periods = []
    for year, value in s.items():
        if value < 0:
            if previous < 0:
                negative_periods[-1].append(year)
            else:
                negative_periods.append([year])
        previous = value
    return [(period[0], period[-1]) for period in negative_periods
        if len(period) >= min_duration]

e.g. get_negative_periods(growth, 3) returns [(2002, 2008)] because 2002-2008 is the only period where growth was negative for 3 or more consecutive years.

Is there a way to vectorize this instead of going row by row? (Returning a series or dataframe instead of tuples would be fine.)

Asked By: Stuart

||

Source

Answer 1

Try creating groups based on where True and False differ, then keep only True groups with year range greater than or equal to the min_duration:

def get_negative_periods(s, min_duration):
    s = s.lt(0).reset_index()
    g = s[0].ne(s[0].shift()).cumsum()[s[0].eq(True)]
    s = s.groupby(g)['index'].agg(['first', 'last'])
    return s[(s['last'] - s['first']) + 1 >= min_duration]


res = get_negative_periods(growth, 3)

res:

     first  last
0               
6.0   2002  2008

Or as a list of lists:

def get_negative_periods(s, min_duration):
    s = s.lt(0).reset_index()
    g = s[0].ne(s[0].shift()).cumsum()[s[0].eq(True)]
    s = s.groupby(g)['index'].agg(['first', 'last'])
    return s[(s['last'] - s['first']) + 1 >= min_duration].values.tolist()


lst = get_negative_periods(growth, 3)

lst:

[[2002, 2008]]

Answered By: Henry Ecker

Answer 2

Here is another way:

min_duration = 3

(s.rename_axis('year').lt(0).diff().ne(0).cumsum()
 .where(s.lt(0)).reset_index(name='cc')
 .groupby('cc').agg(start = ('year','first'),end = ('year','last'))
 .loc[lambda x: (x['end']-x['start']).gt(min_duration)])

or:

m1 = growth.lt(0)
m2 = m1.diff().ne(0).cumsum()

growth.loc[growth.groupby(m2).transform('count').gt(3) & m1].groupby(m2).agg(lambda x: x.iloc[[0,-1]].index.tolist())

Answered By: rhug123

Identify periods in a pandas series where several consecutive values are negative

Question:

Answers: