how to group words as a sentence based on speaker # in pandas DataFrame
Question:
Please consider the following example:
I have a DataFrame
Index
Speaker
Word
0
spk_0
can
1
spk_0
you
2
spk_0
see
3
spk_0
my
4
spk_0
screen
5
spk_0
now
6
spk_0
?
7
spk_1
yes
0
spk_1
,
8
spk_1
now
9
spk_1
I
10
spk_1
can
11
spk_1
see
12
spk_1
your
13
spk_1
screen
14
spk_1
.
15
spk_0
Let
16
spk_0
me
17
spk_0
start
18
spk_0
then
19
spk_2
yes
20
spk_2
sure
I want to combine the Word column such that it should look like the following:
Index
Speaker
Sentence
0
spk_0
can you see my screen now ?
1
spk_1
yes , now I can see your screen .
2
spk_0
let me start then .
3
spk_2
Yes sure .
Can someone please help me find a solution to this problem?
I already had tried group by but didn’t work.
Answers:
You can group by consecutive values of Speaker
column created by comapred shifted value with cumulative sum and aggregate join
:
g = df['Speaker'].ne(df['Speaker'].shift()).cumsum()
df = df.groupby(['Speaker', g],sort=False)['Word'].agg(' '.join).droplevel(-1).reset_index()
print (df)
Speaker Word
0 spk_0 can you see my screen now ?
1 spk_1 yes , now I can see your screen .
2 spk_0 Let me start then
3 spk_2 yes sure
Please consider the following example:
I have a DataFrame
Index | Speaker | Word |
---|---|---|
0 | spk_0 | can |
1 | spk_0 | you |
2 | spk_0 | see |
3 | spk_0 | my |
4 | spk_0 | screen |
5 | spk_0 | now |
6 | spk_0 | ? |
7 | spk_1 | yes |
0 | spk_1 | , |
8 | spk_1 | now |
9 | spk_1 | I |
10 | spk_1 | can |
11 | spk_1 | see |
12 | spk_1 | your |
13 | spk_1 | screen |
14 | spk_1 | . |
15 | spk_0 | Let |
16 | spk_0 | me |
17 | spk_0 | start |
18 | spk_0 | then |
19 | spk_2 | yes |
20 | spk_2 | sure |
I want to combine the Word column such that it should look like the following:
Index | Speaker | Sentence |
---|---|---|
0 | spk_0 | can you see my screen now ? |
1 | spk_1 | yes , now I can see your screen . |
2 | spk_0 | let me start then . |
3 | spk_2 | Yes sure . |
Can someone please help me find a solution to this problem?
I already had tried group by but didn’t work.
You can group by consecutive values of Speaker
column created by comapred shifted value with cumulative sum and aggregate join
:
g = df['Speaker'].ne(df['Speaker'].shift()).cumsum()
df = df.groupby(['Speaker', g],sort=False)['Word'].agg(' '.join).droplevel(-1).reset_index()
print (df)
Speaker Word
0 spk_0 can you see my screen now ?
1 spk_1 yes , now I can see your screen .
2 spk_0 Let me start then
3 spk_2 yes sure