detect the number of three way conversations in chat dataset using pandas
Question:
Three way conversation:
red_01 replies to green_01. green_01 replies back to red_01. This can be considered 1 three way conversation in the entire dataset.
So, I am trying to think of solution to query the count how many such conversations have occured in the entire dataset? This is 1 to 1 conversation system by the way.
I have already sorted by timestamp, used nth(0), nth(1) on red users. This preserved only first two messages sent by all the red users. I have used nth(0) on all green users. This preserves all the first messgaes sent by green users. But unable to think of anything to count the number of conversations that happened in exact sequence of three way conversation(red_0x sends message, green_0x replies, red_0x replies back)
I have a dataset like this:
conversation_id
user_id
messages
timestamp
1
red_01
1
green_01
1
red_01
Answers:
import pandas as pd
test_df = pd.DataFrame([[1,"red1","", 1],[1,"green1","", 2],[1,"red1","", 3], [2,"red1","", 1],[2,"red1","", 2],[2,"red1","", 3]],
columns = ["conversation_id","user_id","messages","timestamp"])
criteria = (
lambda x: len(x) == 3
and x.iloc[0].user_id.startswith("red")
and x.iloc[1].user_id.startswith("green")
and x.iloc[2].user_id.startswith("red")
)
n_groups = (
test_df
.sort_values("timestamp")
.groupby("conversation_id")
.filter(criteria)
).groupby("conversation_id").ngroups
n_groups
Output:
1
Three way conversation:
red_01 replies to green_01. green_01 replies back to red_01. This can be considered 1 three way conversation in the entire dataset.
So, I am trying to think of solution to query the count how many such conversations have occured in the entire dataset? This is 1 to 1 conversation system by the way.
I have already sorted by timestamp, used nth(0), nth(1) on red users. This preserved only first two messages sent by all the red users. I have used nth(0) on all green users. This preserves all the first messgaes sent by green users. But unable to think of anything to count the number of conversations that happened in exact sequence of three way conversation(red_0x sends message, green_0x replies, red_0x replies back)
I have a dataset like this:
conversation_id | user_id | messages | timestamp | |
---|---|---|---|---|
1 | red_01 | |||
1 | green_01 | |||
1 | red_01 |
import pandas as pd
test_df = pd.DataFrame([[1,"red1","", 1],[1,"green1","", 2],[1,"red1","", 3], [2,"red1","", 1],[2,"red1","", 2],[2,"red1","", 3]],
columns = ["conversation_id","user_id","messages","timestamp"])
criteria = (
lambda x: len(x) == 3
and x.iloc[0].user_id.startswith("red")
and x.iloc[1].user_id.startswith("green")
and x.iloc[2].user_id.startswith("red")
)
n_groups = (
test_df
.sort_values("timestamp")
.groupby("conversation_id")
.filter(criteria)
).groupby("conversation_id").ngroups
n_groups
Output:
1