Dataframe fill rows with values based on condition

Question:

Let’s say I have this dataframe:

A | B | C
---------
n | b | c
n | b | c
n | b | c
s | b | c
n | b | c
n | b | c
n | b | c
e | b | c
n | b | c
n | b | c
s | b | c
n | b | c
n | b | c
n | b | c
e | b | c

I want to fill and replace the column A rows values with ‘x’. The rows to fill are the ones before ‘s’ and after ‘e’ but not in between.
So the result would be somthing like this :

A | B | C
---------
x | b | c
x | b | c
x | b | c
s | b | c
n | b | c
n | b | c
n | b | c
e | b | c
x | b | c
x | b | c
s | b | c
n | b | c
n | b | c
n | b | c
e | b | c

Here’s what I have tried :

def applyFunc(s):
    if 's' in str(s):
        return 'x'
    return ''

df['A'] = df['A'].apply(applyFunc)

But this only replaces rows where there is ‘s’.

Asked By: Mejdi Dallel

||

Answers:

First find the rows where a value is after ‘e’ or ‘s’ with:

A = d['A'] # enables shorter reference to df['A']
A.where(A.isin(['e', 's'])).ffill().fillna('e')
['e', 'e', 'e', 's', 's', 's', 's', 'e', 'e', 'e', 's', 's', 's', 's', 'e']

Then find the ‘n’ where is it after a ‘s’ and replace with ‘x’:

df['new_A'] = A.mask((A.where(A.isin(['e', 's'])).ffill().fillna('e').eq('e')&A.eq('n')), 'x')

output:

    A  B  C new_A
0   n  b  c     x
1   n  b  c     x
2   n  b  c     x
3   s  b  c     s
4   n  b  c     n
5   n  b  c     n
6   n  b  c     n
7   e  b  c     e
8   n  b  c     x
9   n  b  c     x
10  s  b  c     s
11  n  b  c     n
12  n  b  c     n
13  n  b  c     n
14  e  b  c     e

NB. I saved the output in a new column for clarity, but the real code should be df['A'] = …

Answered By: mozway

Assuming that there are no duplicate s or e within groups, we can Series.mask the n values between s and e. We can track if we’re between s and e by comparing there the Series.cumsum of s and e are equal:

df['A'] = df['A'].mask(
    df['A'].eq('s').cumsum().eq(df['A'].eq('e').cumsum()) & df['A'].eq('n'),
    'x'
)

df:

    A  B  C
0   x  b  c
1   x  b  c
2   x  b  c
3   s  b  c
4   n  b  c
5   n  b  c
6   n  b  c
7   e  b  c
8   x  b  c
9   x  b  c
10  s  b  c
11  n  b  c
12  n  b  c
13  n  b  c
14  e  b  c

Breakdown of steps as columns:

# See Where S are
df['S cumsum'] = df['A'].eq('s').cumsum()
# See where E are
df['E cumsum'] = df['A'].eq('e').cumsum()
# See where S and E are the same meaning we have seen both or neither but
# not one or the other
df['S == E cumsum'] = df['S cumsum'].eq(df['E cumsum'])
# See where A is n
df['S == E cumsum AND A == n'] = df['S == E cumsum'] & df['A'].eq('n')
    A  B  C  S cumsum  E cumsum  S == E cumsum  S == E cumsum AND A == n
0   n  b  c         0         0           True                      True
1   n  b  c         0         0           True                      True
2   n  b  c         0         0           True                      True
3   s  b  c         1         0          False                     False
4   n  b  c         1         0          False                     False
5   n  b  c         1         0          False                     False
6   n  b  c         1         0          False                     False
7   e  b  c         1         1           True                     False
8   n  b  c         1         1           True                      True
9   n  b  c         1         1           True                      True
10  s  b  c         2         1          False                     False
11  n  b  c         2         1          False                     False
12  n  b  c         2         1          False                     False
13  n  b  c         2         1          False                     False
14  e  b  c         2         2           True                     False

DataFrame and imports:

import pandas as pd

df = pd.DataFrame({
    'A': ['n', 'n', 'n', 's', 'n', 'n', 'n', 'e', 'n', 'n', 's', 'n', 'n', 'n',
          'e'],
    'B': ['b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b',
          'b'],
    'C': ['c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c',
          'c']
})

If there are duplicates we can filter out the desired start and end values (s and e) and take only even groups:
df:

df = pd.DataFrame({
    'A': ['n', 'n', 'n', 's', 's', 'n', 'n', 'e', 'n', 'n', 's', 'n', 'n', 'e',
          'e'],
    'B': ['b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b',
          'b'],
    'C': ['c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c',
          'c']
})

    A  B  C
0   n  b  c
1   n  b  c
2   n  b  c
3   s  b  c
4   s  b  c  # Duplicate S
5   n  b  c
6   n  b  c
7   e  b  c
8   n  b  c
9   n  b  c
10  s  b  c
11  n  b  c
12  n  b  c
13  e  b  c
14  e  b  c  # Duplicate E

Find s and e and filter to keep only even groups:

s = df.loc[df['A'].isin(['s', 'e']), 'A']
df['A'] = df['A'].mask(
    ((df.index.isin(s[s.ne(s.shift())].index).cumsum() % 2) == 0)
    & df['A'].eq('n'),
    'x'
)

df:

    A  B  C
0   x  b  c
1   x  b  c
2   x  b  c
3   s  b  c
4   s  b  c
5   n  b  c
6   n  b  c
7   e  b  c
8   x  b  c
9   x  b  c
10  s  b  c
11  n  b  c
12  n  b  c
13  e  b  c
14  e  b  c
Answered By: Henry Ecker

solution1:

df1.assign(col1=(df1.A=='s').cumsum())
    .assign(col2=(df1.A=='e').cumsum().shift().fillna(0))
    .assign(A=lambda dd:dd.A.mask(dd.col1==dd.col2,'x'))
        

   A    B   C  s    e
0   x   b    c  0  NaN
1   x   b    c  0  0.0
2   x   b    c  0  0.0
3   s   b    c  1  0.0
4   n   b    c  1  0.0
5   n   b    c  1  0.0
6   n   b    c  1  0.0
7   e   b    c  1  0.0
8   x   b    c  1 -1.0
9   x   b    c  1 -1.0
10  s   b    c  2 -1.0
11  n   b    c  2 -1.0
12  n   b    c  2 -1.0
13  n   b    c  2 -1.0
14  e   b    c  2 -1.0

solution2:

def function1(dd:pd.DataFrame):
    dd.loc[:dd.query("A=='s'").index.values[-1]-1,'A']='x'
    return dd.drop('col1',axis=1)

df1.assign(col1=(df1.A=='e').cumsum().shift()).groupby('col1').apply(function1)
Answered By: G.G
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.