Total count of strings within range in dataframe

Question:

I have a dataframe where I want to count the total number of occurrences of the word Yes, as it appears between a range of rows—Dir—and then add that count as a new column.

Type,Problem
Parent,
Dir,Yes
File,
Opp,Yes
Dir,
Metadata,
Subfolder,Yes
Dir,
Opp,Yes

So whenever the word Yes appears in the Problem column between two Dir rows, I need a count to then appear next to the Dir at the beginning of the range.

Expected output would be:

     Type   Problem     yes_count
   Parent       
      Dir       Yes             2
     File       
      Opp       Yes 
      Dir                       1
 Metadata       
Subfolder       Yes 
      Dir                       1
      Opp       Yes 

I could do something like yes_count = df['Problem'].str.count('Yes').sum() to get part of the way there. But how do I also account for the range?

Asked By: user53526356

||

Answers:

Use:

# is the row a "Yes"?
m1 = df['Problem'].eq('Yes')
# is the row a "Dir"?
m2 = df['Type'].eq('Dir')
# form groups starting on each "Dir"
g = m1.groupby(m2.cumsum())
# count the number of "Yes" per group
# assign only on "Dir"
df['yes_count'] = g.transform('sum').where(m2)

Output:


        Type Problem  yes_count
0     Parent     NaN        NaN
1        Dir     Yes        2.0
2       File     NaN        NaN
3        Opp     Yes        NaN
4        Dir     NaN        1.0
5   Metadata     NaN        NaN
6  Subfolder     Yes        NaN
7        Dir     NaN        1.0
8        Opp     Yes        NaN
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.