How to group each individual transaction from a buy and sell history

Question:

Given a data frame that is sorted by symbol and date. I would like to add a new column that identifies each individual transaction. Each individual transaction means that the position is opened and closed after a set of buy and sell orders.

Input data

time           symbol        side      price     size
08-19-2021      AAPL          buy        150       1 ----> position opened here
08-20-2021      AAPL          buy        100       1
08-21-2021      AAPL          buy        200       1
08-22-2021      AAPL          sell        300      1
08-23-2021      AAPL          sell       350       0.5
08-24-2021      AAPL          sell        400      1.5 ----> position closed here
08-25-2021      AAPL          buy        150       2  -----> position opened here
08-26-2021      AAPL          sell        100      2  ----> position closed here
08-27-2021      AAPL          buy        100       2 ---> position opened here
08-28-2021      AAPL          sell        150      1.5
08-29-2021      AAPL          buy        100       0.5
08-29-2021      AAPL          sell        50       0.5
08-29-2021      AAPL          sell       50        0.5 ---> position closed here
08-29-2021      AAPL          buy        100       1 ----> position opened here
08-29-2021      AAPL          sell        50       0.5 ---> position still open here

Desired output (added column trade #)

time           symbol        side      price     size   trade #
08-19-2021      AAPL          buy        150       1      0
08-20-2021      AAPL          buy        100       1      0
08-21-2021      AAPL          buy        200       1      0
08-22-2021      AAPL          sell        300      1      0
08-23-2021      AAPL          sell       350       0.5    0
08-24-2021      AAPL          sell        400      1.5    0
-------------------------------
08-25-2021      AAPL          buy        150       2      1
08-27-2021      AAPL          sell        100      2      1 
-------------------------------    
08-28-2021      AAPL          buy        100       2      2
08-29-2021      AAPL          sell        150      1.5    2
08-29-2021      AAPL          buy        100       0.5    2
08-29-2021      AAPL          sell        50       0.5    2
08-29-2021      AAPL          sell       50        0.5    2
------------------------------
08-29-2021      AAPL          buy        100       1      3
08-29-2021      AAPL          sell        50       0.5    3

The last trade is still open so I would like to still have a number for it.

Input data

pd.DataFrame.from_dict({'symbol': ['AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL'], 
'side': ['buy', 'buy', 'buy', 'sell', 'sell', 'sell', 'buy', 'sell', 'buy', 'sell', 'buy', 'sell', 'sell', 'buy', 'sell'], 
'size': [1,1,1,1,0.5,1.5,2,2,2,1.5, 0.5, 0.5, 0.5, 1, 0.5]})
Asked By: user2512443

||

Answers:

Your position is basically closed when your total size is zero. So in a first step compute the size of your position by

df['position_size']=(df.side.map({'buy':1,'sell':-1})*df['size']).cumsum()

Then you can identify the points where the position changes using by checking its position_size. The cummulative sum of these changepoints is your trade number.

df['trade_nr']=(df.position_size==0).shift(1,fill_value=0).cumsum()
Answered By: Arnau
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.