how to group words as a sentence based on speaker # in pandas DataFrame

Question

Please consider the following example:

I have a DataFrame

Index	Speaker	Word
0	spk_0	can
1	spk_0	you
2	spk_0	see
3	spk_0	my
4	spk_0	screen
5	spk_0	now
6	spk_0	?
7	spk_1	yes
0	spk_1	,
8	spk_1	now
9	spk_1	I
10	spk_1	can
11	spk_1	see
12	spk_1	your
13	spk_1	screen
14	spk_1	.
15	spk_0	Let
16	spk_0	me
17	spk_0	start
18	spk_0	then
19	spk_2	yes
20	spk_2	sure

I want to combine the Word column such that it should look like the following:

Index	Speaker	Sentence
0	spk_0	can you see my screen now ?
1	spk_1	yes , now I can see your screen .
2	spk_0	let me start then .
3	spk_2	Yes sure .

Can someone please help me find a solution to this problem?
I already had tried group by but didn’t work.

Asked By: shahzain

||

Source

Answer 1

You can group by consecutive values of Speaker column created by comapred shifted value with cumulative sum and aggregate join:

g = df['Speaker'].ne(df['Speaker'].shift()).cumsum()
df = df.groupby(['Speaker', g],sort=False)['Word'].agg(' '.join).droplevel(-1).reset_index()
print (df)
  Speaker                               Word
0   spk_0        can you see my screen now ?
1   spk_1  yes , now I can see your screen .
2   spk_0                  Let me start then
3   spk_2                           yes sure

Answered By: jezrael

how to group words as a sentence based on speaker # in pandas DataFrame

Question:

Answers: