Join rows and concatenate attribute values in a csv file with pandas

Question:

I have a csv file structured like this:

enter image description here

As you can see, many lines are repeated (they represent the same entity) with the attribute ‘category’ being the only difference between each other. I would like to join those rows and include all the categories in a single value.

For example the attribute ‘category’ for Walmart should be: "Retail, Dowjones, SuperMarketChains".

Edit:

I would like the output table to be structured like this:

enter image description here

Edit 2:

What worked for me was:

df4.groupby(["ID azienda","Name","Company code", "Marketcap", "Share price", "Earnings", "Revenue", "Shares", "Employees"]
)['Category'].agg(list).reset_index()
Asked By: Maurizio Brini

||

Answers:

Not sure if you want a new table or just a list of the categories. Below is how you could make a table with the hashes if those are important

import pandas as pd
df = pd.DataFrame({
    'Name':['W','W','W','A','A','A'],
    'Category':['Retail','Dow','Chain','Ecom','Internet','Dow'],
    'Hash':[1,2,3,4,5,6],
})

# print(df)
#   Name  Category  Hash
# 0    W    Retail     1
# 1    W       Dow     2
# 2    W     Chain     3
# 3    A      Ecom     4
# 4    A  Internet     5
# 5    A       Dow     6

#Make a new df which has one row per company and one column per category, values are hashes
piv_df = df.pivot(
    index = 'Name',
    columns = 'Category',
    values = 'Hash',
)

# print(piv_df)
# Category  Chain  Dow  Ecom  Internet  Retail
# Name                                        
# A           NaN  6.0   4.0       5.0     NaN
# W           3.0  2.0   NaN       NaN     1.0
Answered By: mitoRibo

Quick and Dirty

df2=df.groupby("Name")['Category'].apply(','.join)

subst=dict(df2)
df['category']=df['Name'].replace(subst)
df.drop_duplicates('Name')

if you prefer multiple categories to be stored as a list in pandas column category…
change first line to

df2=df.groupby("Name")['Category'].apply(list)
Answered By: geekay
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.