Filter DataFrame rows by multiple columns and add them together

Question:

So I have a dataframe structured like:

Date        Metric  Value
2020-01-01  Low     34.5
2020-01-01  High    36.5
2020-01-01  Open    23.5
2020-01-02  Low     32.5
...

I am trying to create another frame, where for every date there is a new ‘Volume’ column which is the High-low for that specific date. The frame is not keyed on the dates so it needs to be joined and then values in different columns added together? Not sure exactly how to do this. I’m trying to get the final result to look like this:

Date        Volume
2020-01-01  2.00
2020-01-02  6.45 
Asked By: pinecone6969

||

Answers:

One approach could be as follows:

  • First, select only from df the rows which have High and Low in column Metric using Series.isin.
  • Next, use df.pivot to reshape the df and assign a new column Volume, containing the result of values in column Low subtracted from those in column High (see: Series.sub).
  • Finally, we add some cosmetic changes: we drop columns High and Low, reset the index (see: df.reset_index), and get rid of df.columns.name (which is automatically set to Metric during df.pivot).
import pandas as pd
import numpy as np

data = {'Date': {0: '2020-01-01', 1: '2020-01-01', 2: '2020-01-01', 
                 3: '2020-01-02', 4: '2020-01-02', 5: '2020-01-02'}, 
        'Metric': {0: 'Low', 1: 'High', 2: 'Open', 3: 'Low', 4: 'High', 
                   5: 'Open'}, 
        'Value': {0: 34.5, 1: 36.5, 2: 23.5, 3: 32.5, 4: 38.95, 5: 32.5}}
df = pd.DataFrame(data)

res = df[df.Metric.isin(['Low','High'])].pivot(index='Date', columns='Metric', 
                                               values='Value')

res = res.assign(Volume=res['High'].sub(res.Low)).drop(
    ['High', 'Low'], axis=1).reset_index(drop=False)
res.columns.name = None

print(res)

         Date  Volume
0  2020-01-01    2.00
1  2020-01-02    6.45
Answered By: ouroboros1

You can create 2 dataframes by filtering by low & high and join them by date. Finally, subtract columns low from high.

data=[
    ("2020-01-01","Low",34.5),
    ("2020-01-01","High",36.5),
    ("2020-01-01","Open",23.5),
    ("2020-01-02","Low",32.5),
    ("2020-01-02","High",38.95),
]

columns = ["Date", "Metric", "Value"]

df = pd.DataFrame(data=data, columns=columns)

df_low = df[df["Metric"]=="Low"].rename(columns={"Value": "Low"}).drop("Metric", axis=1)
df_high = df[df["Metric"]=="High"].rename(columns={"Value": "High"}).drop("Metric", axis=1)
df2 = df_low.merge(df_high, on="Date", how="inner")
df2["Volume"] = df2["High"] - df2["Low"]

[Out]:
         Date   Low   High  Volume
0  2020-01-01  34.5  36.50    2.00
1  2020-01-02  32.5  38.95    6.45
Answered By: Azhar Khan