how to identify sequence order and cumsum the transactions?

Question:

I have the following dataframe:

df = pd.DataFrame({'id':[1,1,1,2,2,3,3,4,5,6,6,6,6,6,8,8,9,11,12,12],'letter':['A','A','Q','Q','Q','F','F','G','D','G','I','I','K','Q','E','S','S','I','I','F']})

My objective is to add another column tx that shows the followings: if it finds Q and there after an I – mark it as 1st transaction. Both Q and I must exists and must have the same comes as last_Q –> first_I.

so the end result should look like this:

enter image description here

Asked By: ProcolHarum

||

Answers:

To easily find the Q-I pattern, I’d use a regex like Q[^QI]*?I, that will also provide the positions where it matches

Then use iloc to set your counter

df = pd.DataFrame({'id': [1, 1, 1, 2, 2, 3, 3, 4, 5, 6, 6, 6, 6, 6, 8, 8, 9, 11, 12, 12],
                   'letter': ['A', 'A', 'Q', 'Q', 'Q', 'F', 'F', 'G', 'D', 'G', 'I', 'I', 
                              'K', 'Q', 'E', 'S', 'S', 'I', 'I', 'F']})

df['tx'] = 0
count = 1
for x in re.finditer("Q[^QI]*?I", df['letter'].str.cat()):
    df.iloc[x.start():x.end(), 2] = count
    count += 1
Answered By: azro

I would use boolean arithmetic:

# map True/False to Q/I
m1 = df['letter'].map({'Q': True, 'I': False})
# ffill the values
m2 = m1.ffill().fillna(False)
# only keep last Qs
m3 = m1.shift(-1).ne(True)
m4 = m2&m3

df['tx'] = (m1&m3).cumsum().where(m4|m4.shift(), 0)

Output:

    id letter  tx
0    1      A   0
1    1      A   0
2    1      Q   0
3    2      Q   0
4    2      Q   1
5    3      F   1
6    3      F   1
7    4      G   1
8    5      D   1
9    6      G   1
10   6      I   1
11   6      I   0
12   6      K   0
13   6      Q   2
14   8      E   2
15   8      S   2
16   9      S   2
17  11      I   2
18  12      I   0
19  12      F   0

Intermediates:

    id letter  tx     m1     m2     m3     m4
0    1      A   0    NaN  False   True  False
1    1      A   0    NaN  False  False  False
2    1      Q   0   True   True  False  False
3    2      Q   0   True   True  False  False
4    2      Q   1   True   True   True   True
5    3      F   1    NaN   True   True   True
6    3      F   1    NaN   True   True   True
7    4      G   1    NaN   True   True   True
8    5      D   1    NaN   True   True   True
9    6      G   1    NaN   True   True   True
10   6      I   1  False  False   True  False
11   6      I   0  False  False   True  False
12   6      K   0    NaN  False  False  False
13   6      Q   2   True   True   True   True
14   8      E   2    NaN   True   True   True
15   8      S   2    NaN   True   True   True
16   9      S   2    NaN   True   True   True
17  11      I   2  False  False   True  False
18  12      I   0  False  False   True  False
19  12      F   0    NaN  False   True  False
Answered By: mozway

Here is another way:

q = df['letter'].eq('Q')
i = df['letter'].eq('I')

m1 = q.diff(-1).ne(0) & q
m2 = i.diff().ne(0) & i

m1.cumsum().where(m1.where(m1|m2.shift()).ffill().fillna(False),0)

Output:

0     0
1     0
2     0
3     0
4     1
5     1
6     1
7     1
8     1
9     1
10    1
11    0
12    0
13    2
14    2
15    2
16    2
17    2
18    0
19    0
Answered By: rhug123
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.