Python – Pandas DataFrame manipulation

Question:

I’ve got a DataFrame called product with a list of orders, products, and quantities for each product. Here’s a screenshot:

enter image description here

I need to make a new DataFrame that has a row for each product name and two more columns with the sum of products ordered (basically a sum on the column quantity per product) and the total sales for each product (sum on column total per product).

I made this function:

products_unique = products['product_id'].unique()

names = [
    products.loc[
        products['product_id'] == elem
    ]['name'].unique()
    for elem in products_unique
]

orders = [
    len(products.loc[
        products['product_id'] == elem
    ])
    for elem in products_unique
]

totals = [
    products.loc[
        products['product_id'] == elem
    ]['total'].sum()
    for elem in products_unique
]

chart_data = pd.DataFrame({
    'Prodotti': products_unique,
    'Nome': names,
    'Ordini': orders,
    'Totale': totals
})

Now, this function works for the purpose I had, but there is something I don’t understand. When I run it, I got this:

enter image description here

As you can see, values in the column names are of the type list. Why does this happen?

And moreover, is there a cleaner way to achieve what I’m building?

Thanks to everyone who gonna help me!

Asked By: Davide

||

Answers:

chart_data = products.groupby('product_id').agg({
    'name': lambda x: ', '.join(x.unique()),
    'total': ['sum', 'count']
})
chart_data.columns = ['Nome', 'Totale', 'Ordini']
chart_data.reset_index(inplace=True)
chart_data.rename(columns={'product_id': 'Prodotti'}, inplace=True)
Answered By: saurabh srivastava

Use groupby_agg:

out = (df.groupby('name') .agg(Prodotti=('product_id', 'first'),
                               Nome=('name', 'first'),
                               Ordini=('total', 'size'),
                               Totale=('total', 'sum'))
         .reset_index(drop=True))

Output:

>>> out
   Prodotti    Nome  Ordini  Totale
0      7980  Prod A       2      22
1      8603  Prod B       1      14

>>> df
   product_id    name  total
0        7980  Prod A     10
1        7980  Prod A     12
2        8603  Prod B     14
Answered By: Corralien