Delete all columns for which value repents consecutively more than 3 times

Question:

I have adf that looks like this:

date stock1 stock2 stock3 stock4 stock5 stock6 stock7 stock8 stock9 stock10
10/20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.9
11/20 0.1 0.9 0.3 0.4 0.3 0.5 0.3 0.2 0.4 0.1
12/20 0.1 0.6 0.9 0.5 0.6 0.7 0.8 0.7 0.9 0.1
10/20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.9
11/20 0.8 0.9 0.3 0.4 0.3 0.5 0.3 0.2 0.9 0.1
12/20 0.3 0.6 0.9 0.5 0.6 0.7 0.8 0.7 0.9 0.1
10/20 0.1 0.2 0.3 0.4 0.5 0.7 0.7 0.8 0.9 0.9
11/20 0.8 0.9 0.3 0.4 0.3 0.7 0.3 0.2 0.4 0.1
12/20 0.3 0.6 0.9 0.5 0.6 0.7 0.8 0.7 0.9 0.1

I want to delete all columns for which the same value repeats, consecutively, more than 3 times. In this example, the columns "stock1", "stock6" and "stock9" should be deleted. In the other columns, we have repeating values more than 3 times, but not one after the other.
I think I can adapt the code from that question Removing values that repeat more than 5 times in Pandas DataFrame, but I could not manage to do that yet.

Asked By: user17717499

||

Answers:

You could want avoid apply here:

N = 3
df.loc[:, 
   df.set_index('date')
     .ne(df.shift()).cumsum()
     .stack()
     .groupby(level=1)
     .value_counts()
     .max(level=0).le(N)]
Answered By: ansev

You can set "date" aside as index, then check if the rows are different from the next one as use it to groupby+cumcount.

Then compute the max count per column, if greater than N-1, drop the column:

df2 = df.set_index('date')
N = 3
df2.loc[:, df2.apply(lambda c: c.groupby(c.ne(c.shift()).cumsum()).cumcount()).max().lt(N-1)]

output:

        stock2   stock3   stock4   stock5   stock7   stock8   stock10
date                                                                 
10/20       0.2      0.3      0.4      0.5      0.7      0.8      0.9
11/20       0.9      0.3      0.4      0.3      0.3      0.2      0.1
12/20       0.6      0.9      0.5      0.6      0.8      0.7      0.1
10/20       0.2      0.3      0.4      0.5      0.7      0.8      0.9
11/20       0.9      0.3      0.4      0.3      0.3      0.2      0.1
12/20       0.6      0.9      0.5      0.6      0.8      0.7      0.1
10/20       0.2      0.3      0.4      0.5      0.7      0.8      0.9
11/20       0.9      0.3      0.4      0.3      0.3      0.2      0.1
12/20       0.6      0.9      0.5      0.6      0.8      0.7      0.1

intermediate count of successive values:

>>> df2.apply(lambda c: c.groupby(c.ne(c.shift()).cumsum()).cumcount())

        stock1   stock2   stock3   stock4   stock5   stock6   stock7   stock8   stock9   stock10
date                                                                                            
10/20         0        0        0        0        0        0        0        0        0        0
11/20         1        0        1        1        0        0        0        0        0        0
12/20         2        0        0        0        0        0        0        0        0        1
10/20         3        0        0        0        0        0        0        0        1        0
11/20         0        0        1        1        0        0        0        0        2        0
12/20         0        0        0        0        0        0        0        0        3        1
10/20         0        0        0        0        0        1        0        0        4        0
11/20         0        0        1        1        0        2        0        0        0        0
12/20         0        0        0        0        0        3        0        0        0        1
Answered By: mozway
df1[['date']].join(
    df1.iloc[:,1:].loc[:,df1.iloc[:,1:].apply(lambda ss:(ss.diff()!=0).cumsum().value_counts().max())<3]
)
    
    
     date  stock2  stock3  stock4  stock5  stock7  stock8  stock10
0  10/20     0.2     0.3     0.4     0.5     0.7     0.8      0.9
1  11/20     0.9     0.3     0.4     0.3     0.3     0.2      0.1
2  12/20     0.6     0.9     0.5     0.6     0.8     0.7      0.1
3  10/20     0.2     0.3     0.4     0.5     0.7     0.8      0.9
4  11/20     0.9     0.3     0.4     0.3     0.3     0.2      0.1
5  12/20     0.6     0.9     0.5     0.6     0.8     0.7      0.1
6  10/20     0.2     0.3     0.4     0.5     0.7     0.8      0.9
7  11/20     0.9     0.3     0.4     0.3     0.3     0.2      0.1
8  12/20     0.6     0.9     0.5     0.6     0.8     0.7      0.1
Answered By: G.G
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.