detect the number of three way conversations in chat dataset using pandas

Question:

Three way conversation:

red_01 replies to green_01. green_01 replies back to red_01. This can be considered 1 three way conversation in the entire dataset.

So, I am trying to think of solution to query the count how many such conversations have occured in the entire dataset? This is 1 to 1 conversation system by the way.

I have already sorted by timestamp, used nth(0), nth(1) on red users. This preserved only first two messages sent by all the red users. I have used nth(0) on all green users. This preserves all the first messgaes sent by green users. But unable to think of anything to count the number of conversations that happened in exact sequence of three way conversation(red_0x sends message, green_0x replies, red_0x replies back)

I have a dataset like this:

conversation_id user_id messages timestamp
1 red_01
1 green_01
1 red_01
Asked By: n3a5p7s9t1e3r

||

Answers:

import pandas as pd
test_df = pd.DataFrame([[1,"red1","", 1],[1,"green1","", 2],[1,"red1","", 3], [2,"red1","", 1],[2,"red1","", 2],[2,"red1","", 3]],
    columns = ["conversation_id","user_id","messages","timestamp"])

criteria = (
    lambda x: len(x) == 3 
    and x.iloc[0].user_id.startswith("red") 
    and x.iloc[1].user_id.startswith("green") 
    and x.iloc[2].user_id.startswith("red")
)

n_groups = (
    test_df
    .sort_values("timestamp")
    .groupby("conversation_id")
    .filter(criteria)
).groupby("conversation_id").ngroups

n_groups

Output:

1
Answered By: Nikolay Zakirov
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.