Delete all columns for which value repents consecutively more than 3 times

Question

I have adf that looks like this:

date	stock1	stock2	stock3	stock4	stock5	stock6	stock7	stock8	stock9	stock10
10/20	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9	0.9
11/20	0.1	0.9	0.3	0.4	0.3	0.5	0.3	0.2	0.4	0.1
12/20	0.1	0.6	0.9	0.5	0.6	0.7	0.8	0.7	0.9	0.1
10/20	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9	0.9
11/20	0.8	0.9	0.3	0.4	0.3	0.5	0.3	0.2	0.9	0.1
12/20	0.3	0.6	0.9	0.5	0.6	0.7	0.8	0.7	0.9	0.1
10/20	0.1	0.2	0.3	0.4	0.5	0.7	0.7	0.8	0.9	0.9
11/20	0.8	0.9	0.3	0.4	0.3	0.7	0.3	0.2	0.4	0.1
12/20	0.3	0.6	0.9	0.5	0.6	0.7	0.8	0.7	0.9	0.1

I want to delete all columns for which the same value repeats, consecutively, more than 3 times. In this example, the columns "stock1", "stock6" and "stock9" should be deleted. In the other columns, we have repeating values more than 3 times, but not one after the other.
I think I can adapt the code from that question Removing values that repeat more than 5 times in Pandas DataFrame, but I could not manage to do that yet.

Asked By: user17717499

||

Source

Answer 1

You could want avoid apply here:

N = 3
df.loc[:, 
   df.set_index('date')
     .ne(df.shift()).cumsum()
     .stack()
     .groupby(level=1)
     .value_counts()
     .max(level=0).le(N)]

Answered By: ansev

Answer 2

You can set "date" aside as index, then check if the rows are different from the next one as use it to groupby+cumcount.

Then compute the max count per column, if greater than N-1, drop the column:

df2 = df.set_index('date')
N = 3
df2.loc[:, df2.apply(lambda c: c.groupby(c.ne(c.shift()).cumsum()).cumcount()).max().lt(N-1)]

output:

        stock2   stock3   stock4   stock5   stock7   stock8   stock10
date                                                                 
10/20       0.2      0.3      0.4      0.5      0.7      0.8      0.9
11/20       0.9      0.3      0.4      0.3      0.3      0.2      0.1
12/20       0.6      0.9      0.5      0.6      0.8      0.7      0.1
10/20       0.2      0.3      0.4      0.5      0.7      0.8      0.9
11/20       0.9      0.3      0.4      0.3      0.3      0.2      0.1
12/20       0.6      0.9      0.5      0.6      0.8      0.7      0.1
10/20       0.2      0.3      0.4      0.5      0.7      0.8      0.9
11/20       0.9      0.3      0.4      0.3      0.3      0.2      0.1
12/20       0.6      0.9      0.5      0.6      0.8      0.7      0.1

intermediate count of successive values:

>>> df2.apply(lambda c: c.groupby(c.ne(c.shift()).cumsum()).cumcount())

        stock1   stock2   stock3   stock4   stock5   stock6   stock7   stock8   stock9   stock10
date                                                                                            
10/20         0        0        0        0        0        0        0        0        0        0
11/20         1        0        1        1        0        0        0        0        0        0
12/20         2        0        0        0        0        0        0        0        0        1
10/20         3        0        0        0        0        0        0        0        1        0
11/20         0        0        1        1        0        0        0        0        2        0
12/20         0        0        0        0        0        0        0        0        3        1
10/20         0        0        0        0        0        1        0        0        4        0
11/20         0        0        1        1        0        2        0        0        0        0
12/20         0        0        0        0        0        3        0        0        0        1

Answered By: mozway

Answer 3

df1[['date']].join(
    df1.iloc[:,1:].loc[:,df1.iloc[:,1:].apply(lambda ss:(ss.diff()!=0).cumsum().value_counts().max())<3]
)
    
    
     date  stock2  stock3  stock4  stock5  stock7  stock8  stock10
0  10/20     0.2     0.3     0.4     0.5     0.7     0.8      0.9
1  11/20     0.9     0.3     0.4     0.3     0.3     0.2      0.1
2  12/20     0.6     0.9     0.5     0.6     0.8     0.7      0.1
3  10/20     0.2     0.3     0.4     0.5     0.7     0.8      0.9
4  11/20     0.9     0.3     0.4     0.3     0.3     0.2      0.1
5  12/20     0.6     0.9     0.5     0.6     0.8     0.7      0.1
6  10/20     0.2     0.3     0.4     0.5     0.7     0.8      0.9
7  11/20     0.9     0.3     0.4     0.3     0.3     0.2      0.1
8  12/20     0.6     0.9     0.5     0.6     0.8     0.7      0.1

Answered By: G.G

Delete all columns for which value repents consecutively more than 3 times

Question:

Answers: