How can we melt a dataframe and list words under columns?

Question:

I have a dataframe that looks like this.

import pandas as pd

data = {'clean_words':['good','evening','how','are','you','how','can','i','help'],
        'start_time':[1900,2100,2500,2750,2900,1500,1650,1770,1800],
        'end_time':[2100,2500,2750,2900,3000,1650,1770,1800,1950],
        'transaction':[1,1,1,1,1,2,2,2,2]}

df = pd.DataFrame(data)
df

enter image description here

If I try a basic melt, like so…

df_melted = df.pivot_table(index='clean_words', columns='transaction')
df_melted.tail()

I get this…

enter image description here

What I really want is the transaction number as columns and the words listed down. So, if transaction1 was the column, these words would be listed in rows, under that column:

`'good','evening','how','are','you'`

Under transaction2, these words would be listed in rows, under that column:

'how','can','i','help'

How can I do that? The start_time and end_time are kind of superfluous here.

Asked By: ASH

||

Answers:

Is this the format you want?

>>> pd.DataFrame({'1': ['good', 'evening', 'how', 'are', 'you'], '2': ['how', 'can', 'I', 'help', None]})
         1     2
0     good   how
1  evening   can
2      how     I
3      are  help
4      you  None

I haven’t done that before but you could pivot your data and collect a list of words under each transaction column.

>>> df.pivot_table(columns='transaction', values='clean_words', aggfunc=list)
transaction                               1                    2
clean_words  [good, evening, how, are, you]  [how, can, i, help]

Or group by transaction and collect a list of words.

>>> df.groupby('transaction', as_index=False).agg(clean_words=pd.NamedAgg(column='clean_words', aggfunc=list))
   transaction                     clean_words
0            1  [good, evening, how, are, you]
1            2             [how, can, i, help]
Answered By: gearlessSheave
import pandas as pd
import numpy as np

data = {'clean_words':['good','evening','how','are','you','how','can','i','help'],
        'start_time':[1900,2100,2500,2750,2900,1500,1650,1770,1800],
        'end_time':[2100,2500,2750,2900,3000,1650,1770,1800,1950],
        'transaction':[1,1,1,1,1,2,2,2,2]}

df = pd.DataFrame(data)

df_melted = df.groupby('transaction')['clean_words'].apply(np.array).reset_index()

print(df_melted)

transaction                     clean_words
0            1  [good, evening, how, are, you]
1            2             [how, can, i, help]
Answered By: manoj