Python – trouble pivoting, grouping, and summing dataframe columns

Question:

I have this code. I need to group by CustomerName and then sum the filegroups.

def consolidated_df():
    df = breakdown_df()
    df.pivot_table(index='CustomerName', columns='FileGroup', aggfunc="sum")
    return df

breakdown_df() looks like

ID    CustomerName    FileGroup    Size        Size(Bytes)
1     CustomerA       Database     99.8 M      104667648
1     CustomerA       Database     99.8 M      104667648
1     CustomerA       Backup       99.8 M      104667648
1     CustomerA       Backup       99.8 M      104667648
1     CustomerA       Site         99.8 M      104667648
1     CustomerA       Site         99.8 M      104667648
2     CustomerB       Database     99.8 M      104667648
2     CustomerB       Database     99.8 M      104667648
2     CustomerB       Backup       99.8 M      104667648
2     CustomerB       Backup       99.8 M      104667648
2     CustomerB       Site         99.8 M      104667648
2     CustomerB       Site         99.8 M      104667648

I am trying to roll it up into

ID    CustomerName    DatabaseSize    DatabaseSizeBytes    BackupSize     BackupSizeBytes     SiteSize            SiteSizeByte         TotalSize
1     CustomerA       [Total Size]    [Total Size Bytes]   [TotalSize]    [Total Size Bites]  [Total Site Size]   [Total Site Bites]   [Total Bytes for everything]
2     CustomerB       [Total Size]    [Total Size Bytes]   [TotalSize]    [Total Size Bites]  [Total Site Size]   [Total Site Bites]   [Total Bytes for everything]

I’m not so worried about actually summing Size because I can convert the bites. I just can’t seem to get my pivot to work and unsure where I am going wrong.

Asked By: DuckButts

||

Answers:

If you don’t explicitly set values, it’ll try to use all remaining columns…

out = df.pivot_table(index='CustomerName', columns='FileGroup', values='Size(Bytes)', aggfunc='sum')
print(out)

Output:

FileGroup        Backup   Database       Site
CustomerName                                 
CustomerA     209335296  209335296  209335296
CustomerB     209335296  209335296  209335296

You can also have margins if desired:

df.pivot_table(index='CustomerName', 
               columns='FileGroup', 
               values='Size(Bytes)', 
               aggfunc='sum', 
               margins=True, 
               margins_name='TotalSize').drop('TotalSize')

# Output:

FileGroup        Backup   Database       Site  TotalSize
CustomerName                                            
CustomerA     209335296  209335296  209335296  628005888
CustomerB     209335296  209335296  209335296  628005888
Answered By: BeRT2me

With help of another StackOverflow answer:

# https://stackoverflow.com/a/23773174/10035985
def sizeof_fmt(num, use_kibibyte=True):
    base, suffix = [(1000.0, "B"), (1024.0, "iB")][use_kibibyte]
    for x in ["B", *map(lambda x: x + suffix, list("kMGTP"))]:
        if -base < num < base:
            return "%3.1f %s" % (num, x)
        num /= base
    return "%3.1f %s" % (num, x)


x = df.pivot_table(
    index=["ID", "CustomerName"],
    columns=["FileGroup"],
    values="Size(Bytes)",
    aggfunc="sum",
)

columns_to_convert = list(x.columns)
x = pd.concat([x, x.add_suffix("SizeBytes")], axis=1)

x["TotalSize"] = x[columns_to_convert].sum(axis=1).apply(sizeof_fmt)

for c in columns_to_convert:
    x[c] = x[c].apply(sizeof_fmt)

x.columns.name, x.index.name = None, None
print(x.reset_index())

Prints:

   ID CustomerName     Backup   Database       Site  BackupSizeBytes  DatabaseSizeBytes  SiteSizeBytes  TotalSize
0   1    CustomerA  199.6 MiB  199.6 MiB  199.6 MiB        209335296          209335296      209335296  598.9 MiB
1   2    CustomerB  199.6 MiB  199.6 MiB  199.6 MiB        209335296          209335296      209335296  598.9 MiB
Answered By: Andrej Kesely
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.