Find the column name of the highest weighted average

Question

I have a DF that looks like this. In the investor columns, 1 means invested and 0 mean no. If there are more than 1 investors invested in a project, we can assume that they share the investment equally (eg: Investor A and B each invested 50000 in project "something"). I want to calculate the total amount that each investor put in and find out who invested the most.

Project	Invested Amount	Investor A	Investor B	Investor C
Something	100000	1	1	0
Another	5000000	0	0	1
Last	25000000	1	1	1

Right now I am thinking of filtering by each investor that has a 1 and then divide by them sum of all the investor column. Here is what I tried but I am still missing something:

Investor_A = df[df['Investor A'] == 1]
test = Investor_A['Invested Amount'] / (df.iloc[:,3:5].sum())

Expected output:

Investor A would have put in a total of 100000/2 + 25000000/3 = 8383333.33333. Investor B would have put in a total of 100000/2 + 25000000/3 = 8383333.33333. Investor C would have put in a total of 5000000 + 25000000/3 = 13333333.3333 –> Investor C invested the most amount of money.

Asked By: Nguyễn Duy

||

Source

Answer 1

Call filter() on the Investor columns and divide the Invested Amount column by the row-wise sum of the Investor columns. Then multiply it by the investors columns again to get the total share of each investor for each item. Then calling sum() would find the total investment of each investor and idxmax() would fetch the name of the investor.

investors = df.filter(like='Investor')
avg_invested_amount = df['Invested Amount'] / investors.sum(1)
investment_shares = investors.mul(avg_invested_amount, axis=0)
investment_per_investor = investment_shares.sum()
investment_per_investor.idxmax()
#'Investor C'

FYI, this code can be written in 2 lines (but much less legible):

investors = df.filter(like='Investor')
investors.mul(investors.sum(1).rdiv(df['Invested Amount']), axis=0).sum().idxmax()

Answered By: New England cottontail

Answer 2

@NewEnglandcottontail’s way is straights forward… Another way could be this;

import pandas as pd 

df = pd.DataFrame({"Project":["Project1","Project2","Project3"],
                   "Invested Amount":[100000,5000000,25000000],
                   "Investor A":[1,0,1],
                   "Investor B":[1,0,1],
                   "Investor C":[0,1,1]})

df1 = df.copy()
col_lst = [col for col in df1.columns if "Investor" in col]
df1[col_lst] = df1.apply(lambda x: x[col_lst] * x["Invested Amount"]/sum(x[col_lst]),axis=1)

lst = df1[col_lst].sum().to_list()
print("Investor who invested maximum ammount:",col_lst[lst.index(max(lst))])

Output;

Investor who invested maximum ammount: Investor C

Answered By: Sachin Kohli

Find the column name of the highest weighted average

Question:

Answers: