I want to multiply two columns in a pandas DataFrame and add the result into a new column

Question:

I’m trying to multiply two existing columns in a pandas Dataframe (orders_df): Prices (stock close price) and Amount (stock quantities) and add the calculation to a new column called Value. For some reason when I run this code, all the rows under the Value column are positive numbers, while some of the rows should be negative. Under the Action column in the DataFrame there are seven rows with the 'Sell' string and seven with the 'Buy' string.

for i in orders_df.Action:
 if i  == 'Sell':
  orders_df['Value'] = orders_df.Prices*orders_df.Amount
 elif i == 'Buy':
  orders_df['Value'] = -orders_df.Prices*orders_df.Amount)

Please let me know what i’m doing wrong !

Asked By: OAK

||

Answers:

You can use the DataFrame apply method:

order_df['Value'] = order_df.apply(lambda row: (row['Prices']*row['Amount']
                                               if row['Action']=='Sell'
                                               else -row['Prices']*row['Amount']),
                                   axis=1)

It is usually faster to use these methods rather than over for loops.

Answered By: Andy Hayden

If we’re willing to sacrifice the succinctness of Hayden’s solution, one could also do something like this:

In [22]: orders_df['C'] = orders_df.Action.apply(
               lambda x: (1 if x == 'Sell' else -1))

In [23]: orders_df   # New column C represents the sign of the transaction
Out[23]:
   Prices  Amount Action  C
0       3      57   Sell  1
1      89      42   Sell  1
2      45      70    Buy -1
3       6      43   Sell  1
4      60      47   Sell  1
5      19      16    Buy -1
6      56      89   Sell  1
7       3      28    Buy -1
8      56      69   Sell  1
9      90      49    Buy -1

Now we have eliminated the need for the if statement. Using DataFrame.apply(), we also do away with the for loop. As Hayden noted, vectorized operations are always faster.

In [24]: orders_df['Value'] = orders_df.Prices * orders_df.Amount * orders_df.C

In [25]: orders_df   # The resulting dataframe
Out[25]:
   Prices  Amount Action  C  Value
0       3      57   Sell  1    171
1      89      42   Sell  1   3738
2      45      70    Buy -1  -3150
3       6      43   Sell  1    258
4      60      47   Sell  1   2820
5      19      16    Buy -1   -304
6      56      89   Sell  1   4984
7       3      28    Buy -1    -84
8      56      69   Sell  1   3864
9      90      49    Buy -1  -4410

This solution takes two lines of code instead of one, but is a bit easier to read. I suspect that the computational costs are similar as well.

Answered By: Aman

I think an elegant solution is to use the where method (also see the API docs):

In [37]: values = df.Prices * df.Amount

In [38]: df['Values'] = values.where(df.Action == 'Sell', other=-values)

In [39]: df
Out[39]: 
   Prices  Amount Action  Values
0       3      57   Sell     171
1      89      42   Sell    3738
2      45      70    Buy   -3150
3       6      43   Sell     258
4      60      47   Sell    2820
5      19      16    Buy    -304
6      56      89   Sell    4984
7       3      28    Buy     -84
8      56      69   Sell    3864
9      90      49    Buy   -4410

Further more this should be the fastest solution.

Answered By: bmu

For me, this is the clearest and most intuitive:

values = []
for action in ['Sell','Buy']:
    amounts = orders_df['Amounts'][orders_df['Action'==action]].values
    if action == 'Sell':
        prices = orders_df['Prices'][orders_df['Action'==action]].values
    else:
        prices = -1*orders_df['Prices'][orders_df['Action'==action]].values
    values += list(amounts*prices)  
orders_df['Values'] = values

The .values method returns a numpy array allowing you to easily multiply element-wise and then you can cumulatively generate a list by ‘adding’ to it.

Answered By: Michael Silverstein

Since this question came up again, I think a good clean approach is using assign.

The code is quite expressive and self-describing:

df = df.assign(Value = lambda x: x.Prices * x.Amount * x.Action.replace({'Buy' : 1, 'Sell' : -1}))
Answered By: FLab

Good solution from bmu. I think it’s more readable to put the values inside the parentheses vs outside.

    df['Values'] = np.where(df.Action == 'Sell', 
                            df.Prices*df.Amount, 
                           -df.Prices*df.Amount)

Using some pandas built in functions.

    df['Values'] = np.where(df.Action.eq('Sell'), 
                            df.Prices.mul(df.Amount), 
                           -df.Prices.mul(df.Amount))

To make things neat, I take Hayden’s solution but make a small function out of it.

def create_value(row):
    if row['Action'] == 'Sell':
        return row['Prices'] * row['Amount']
    else:
        return -row['Prices']*row['Amount']

so that when we want to apply the function to our dataframe, we can do..

df['Value'] = df.apply(lambda row: create_value(row), axis=1)

…and any modifications only need to occur in the small function itself.

Concise, Readable, and Neat!

Answered By: Jenobi

First, multiply the columns Prices and Amount. Afterwards use mask to negate the values if the condition is True:

df.assign(
    Values=(df["Prices"] * df["Amount"]).mask(df["Action"] == "Buy", lambda x: -x)
)
Answered By: rachwa
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.