Concatenate rows based on name with pandas dataframe


I am trying to concatenate rows of data from this source

There are lines from the same speaker but they’re broken into different rows in the dataframe. I’m trying to concatenate those speaker blocks into one row instead of 2+ rows.

Here’s what I’ve tried but it doesn’t work. I’m still learning pandas and python.

url = (r'')
data = pd.read_csv(url, on_bad_lines='skip')
data.drop('I', inplace=True, axis=1)
data.drop('I.1', inplace=True, axis=1)
data.rename(columns={'In delivering my son from me, I bury a second husband.': 'Text', 'COUNTESS': 'Speaker'}, inplace=True)
data = data.groupby(['Speaker'])['Text'].apply(' '.join).reset_index()
Asked By: EthanMcQ



A little ugly, but you can group by consecutive values using a helper series based on shift() and cumsum(). Then aggregating in the group by:

df = pd.read_csv('',on_bad_lines='skip', names=['act','scene','char','line'])
g = df['char'].ne(df['char'].shift()).cumsum().rename('speakernumber')
df = df.groupby(g).agg({'act':'first', 'scene':'first', 'char':'first', 'line': ' '.join}).reset_index()

I believe this will work across acts and scenes as well althought I didn’t dig in deep enough to test.

Answered By: JNevill
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.