Function that retuns a dataframe without leading 0s of a specific column

Question:

I have the following dataframe:

df=pd.DataFrame({
        'n' : [0,1,2,3, 0,1,2, 0,1,2],
    'col1' : ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C'],
    'col2' : [0, 0, 0, 0, 3.3, 0, 4, 1.94, 0, 6.17]
    })

It has the form:

    n   col1    col2
0   0   A   0.00
1   1   A   0.00
2   2   A   0.00
3   3   B   0.00
4   0   B   3.30
5   1   B   0.00
6   2   B   4.00
7   0   C   1.94
8   1   C   0.00
9   2   C   6.17

I want a function that will have that dataframe as argument and will return a new dataframe without the first rows where values are 0s in the column ‘col2’

My code

def remove_lead_zeros(df):
   new_df = df[df['col2'] != 0]
   return new_df

My function removes all rows having 0.0 values while I want to remove only the all first ones,

Goal

Is to get the following dataframe as result:

    n   col1    col2
0   0   B     3.30
1   1   B     0.00
2   2   B     4.00
3   0   C     1.94
4   1   C     0.00
5   2   C     6.17

Any help from your side will be highly appreciated (Upvoting all answers), thank you !

Asked By: Khaled DELLAL

||

Answers:

Use groupby.cummax on the boolean series of non-zero col2 values and boolean indexing:

out = df[df['col2'].ne(0).groupby(df['col1']).cummax()]

Output:

   n col1  col2
4  0    B  3.30
5  1    B  0.00
6  2    B  4.00
7  0    C  1.94
8  1    C  0.00
9  2    C  6.17

Intermediates to understand the logic:

   n col1  col2  ne(0)  groupby.cummax
0  0    A  0.00  False           False
1  1    A  0.00  False           False
2  2    A  0.00  False           False
3  3    B  0.00  False           False
4  0    B  3.30   True            True
5  1    B  0.00  False            True
6  2    B  4.00   True            True
7  0    C  1.94   True            True
8  1    C  0.00  False            True
9  2    C  6.17   True            True
Answered By: mozway

You can use cumsum:

>>> df[df.groupby('col1')['col2'].cumsum().ne(0)]
   n col1  col2
4  0    B  3.30
5  1    B  0.00
6  2    B  4.00
7  0    C  1.94
8  1    C  0.00
9  2    C  6.17

While the sum is 0, it means there are leading zeroes.

>>> pd.concat([df, df.groupby('col1')['col2'].cumsum()], axis=1)
   n col1  col2  col2
0  0    A  0.00  0.00  # remove
1  1    A  0.00  0.00  # remove
2  2    A  0.00  0.00  # remove
3  3    B  0.00  0.00  # remove
4  0    B  3.30  3.30  # keep
5  1    B  0.00  3.30  # keep
6  2    B  4.00  7.30  # keep
7  0    C  1.94  1.94  # keep
8  1    C  0.00  1.94  # keep
9  2    C  6.17  8.11  # keep
Answered By: Corralien

First, get a boolean array of where col2 is not 0, and then use cumulative max, to get a mask that you can apply to your dataframe. Then you need to reset the index, and you get what you want

result = df[(df["col2"] != 0).cummax()].reset_index(drop=True)

where result looks like

    n   col1 col2
0   0   B    3.30
1   1   B    0.00
2   2   B    4.00
3   0   C    1.94
4   1   C    0.00
5   2   C    6.17
Answered By: Florent Monin
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.