Concatenate strings from several rows using Pandas groupby

Question:

I want to merge several strings in a dataframe based on a groupedby in Pandas.

This is my code so far:

import pandas as pd
from io import StringIO

data = StringIO("""
"name1","hej","2014-11-01"
"name1","du","2014-11-02"
"name1","aj","2014-12-01"
"name1","oj","2014-12-02"
"name2","fin","2014-11-01"
"name2","katt","2014-11-02"
"name2","mycket","2014-12-01"
"name2","lite","2014-12-01"
""")

# load string as stream into dataframe
df = pd.read_csv(data,header=0, names=["name","text","date"],parse_dates=[2])

# add column with month
df["month"] = df["date"].apply(lambda x: x.month)

I want the end result to look like this:

enter image description here

I don’t get how I can use groupby and apply some sort of concatenation of the strings in the column “text”. Any help appreciated!

Asked By: mattiasostmar

||

Answers:

You can groupby the 'name' and 'month' columns, then call transform which will return data aligned to the original df and apply a lambda where we join the text entries:

In [119]:

df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x))
df[['name','text','month']].drop_duplicates()
Out[119]:
    name         text  month
0  name1       hej,du     11
2  name1        aj,oj     12
4  name2     fin,katt     11
6  name2  mycket,lite     12

I sub the original df by passing a list of the columns of interest df[['name','text','month']] here and then call drop_duplicates

EDIT actually I can just call apply and then reset_index:

In [124]:

df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index()

Out[124]:
    name  month         text
0  name1     11       hej,du
1  name1     12        aj,oj
2  name2     11     fin,katt
3  name2     12  mycket,lite

update

the lambda is unnecessary here:

In[38]:
df.groupby(['name','month'])['text'].apply(','.join).reset_index()

Out[38]: 
    name  month         text
0  name1     11           du
1  name1     12        aj,oj
2  name2     11     fin,katt
3  name2     12  mycket,lite
Answered By: EdChum

The answer by EdChum provides you with a lot of flexibility but if you just want to concateate strings into a column of list objects you can also:

output_series = df.groupby(['name','month'])['text'].apply(list)
Answered By: Rutger Hofste

For me the above solutions were close but added some unwanted /n's and dtype:object, so here’s a modified version:

df.groupby(['name', 'month'])['text'].apply(lambda text: ''.join(text.to_string(index=False))).str.replace('(\n)', '').reset_index()
Answered By: Nic Scozzaro

We can groupby the ‘name’ and ‘month’ columns, then call agg() functions of Panda’s DataFrame objects.

The aggregation functionality provided by the agg() function allows multiple statistics to be calculated per group in one calculation.

df.groupby(['name', 'month'], as_index = False).agg({'text': ' '.join})

enter image description here

Answered By: Ram Prajapati

If you want to concatenate your "text" in a list:

df.groupby(['name', 'month'], as_index = False).agg({'text': list})
Answered By: Ismail

Although, this is an old question. But just in case. I used the below code and it seems to work like a charm.

text = ''.join(df[df['date'].dt.month==8]['text'])
Answered By: MMSA

Please try this line of code : –

df.groupby(['name','month'])['text'].apply(','.join).reset_index()
Answered By: Ashish Anand

Thanks to all the other answers, the following is probably the most concise and feels more natural. Using df.groupby("X")["A"].agg() aggregates over one or many selected columns.

df = pandas.DataFrame({'A' : ['a', 'a', 'b', 'c', 'c'],
                       'B' : ['i', 'j', 'k', 'i', 'j'],
                       'X' : [1, 2, 2, 1, 3]})

  A  B  X
  a  i  1
  a  j  2
  b  k  2
  c  i  1
  c  j  3

df.groupby("X", as_index=False)["A"].agg(' '.join)

  X    A
  1  a c
  2  a b
  3    c

df.groupby("X", as_index=False)[["A", "B"]].agg(' '.join)

  X    A    B
  1  a c  i i
  2  a b  j k
  3    c    j
Answered By: Paul Rougieux