Filter DataFrame rows by multiple columns and add them together
Question:
So I have a dataframe structured like:
Date Metric Value
2020-01-01 Low 34.5
2020-01-01 High 36.5
2020-01-01 Open 23.5
2020-01-02 Low 32.5
...
I am trying to create another frame, where for every date there is a new ‘Volume’ column which is the High-low for that specific date. The frame is not keyed on the dates so it needs to be joined and then values in different columns added together? Not sure exactly how to do this. I’m trying to get the final result to look like this:
Date Volume
2020-01-01 2.00
2020-01-02 6.45
Answers:
One approach could be as follows:
- First, select only from
df
the rows which have High
and Low
in column Metric
using Series.isin
.
- Next, use
df.pivot
to reshape the df
and assign
a new column Volume
, containing the result of values in column Low
subtracted from those in column High
(see: Series.sub
).
- Finally, we add some cosmetic changes: we
drop
columns High
and Low
, reset the index (see: df.reset_index
), and get rid of df.columns.name
(which is automatically set to Metric
during df.pivot
).
import pandas as pd
import numpy as np
data = {'Date': {0: '2020-01-01', 1: '2020-01-01', 2: '2020-01-01',
3: '2020-01-02', 4: '2020-01-02', 5: '2020-01-02'},
'Metric': {0: 'Low', 1: 'High', 2: 'Open', 3: 'Low', 4: 'High',
5: 'Open'},
'Value': {0: 34.5, 1: 36.5, 2: 23.5, 3: 32.5, 4: 38.95, 5: 32.5}}
df = pd.DataFrame(data)
res = df[df.Metric.isin(['Low','High'])].pivot(index='Date', columns='Metric',
values='Value')
res = res.assign(Volume=res['High'].sub(res.Low)).drop(
['High', 'Low'], axis=1).reset_index(drop=False)
res.columns.name = None
print(res)
Date Volume
0 2020-01-01 2.00
1 2020-01-02 6.45
You can create 2 dataframes by filtering by low & high and join them by date. Finally, subtract columns low from high.
data=[
("2020-01-01","Low",34.5),
("2020-01-01","High",36.5),
("2020-01-01","Open",23.5),
("2020-01-02","Low",32.5),
("2020-01-02","High",38.95),
]
columns = ["Date", "Metric", "Value"]
df = pd.DataFrame(data=data, columns=columns)
df_low = df[df["Metric"]=="Low"].rename(columns={"Value": "Low"}).drop("Metric", axis=1)
df_high = df[df["Metric"]=="High"].rename(columns={"Value": "High"}).drop("Metric", axis=1)
df2 = df_low.merge(df_high, on="Date", how="inner")
df2["Volume"] = df2["High"] - df2["Low"]
[Out]:
Date Low High Volume
0 2020-01-01 34.5 36.50 2.00
1 2020-01-02 32.5 38.95 6.45
So I have a dataframe structured like:
Date Metric Value
2020-01-01 Low 34.5
2020-01-01 High 36.5
2020-01-01 Open 23.5
2020-01-02 Low 32.5
...
I am trying to create another frame, where for every date there is a new ‘Volume’ column which is the High-low for that specific date. The frame is not keyed on the dates so it needs to be joined and then values in different columns added together? Not sure exactly how to do this. I’m trying to get the final result to look like this:
Date Volume
2020-01-01 2.00
2020-01-02 6.45
One approach could be as follows:
- First, select only from
df
the rows which haveHigh
andLow
in columnMetric
usingSeries.isin
. - Next, use
df.pivot
to reshape thedf
andassign
a new columnVolume
, containing the result of values in columnLow
subtracted from those in columnHigh
(see:Series.sub
). - Finally, we add some cosmetic changes: we
drop
columnsHigh
andLow
, reset the index (see:df.reset_index
), and get rid ofdf.columns.name
(which is automatically set toMetric
duringdf.pivot
).
import pandas as pd
import numpy as np
data = {'Date': {0: '2020-01-01', 1: '2020-01-01', 2: '2020-01-01',
3: '2020-01-02', 4: '2020-01-02', 5: '2020-01-02'},
'Metric': {0: 'Low', 1: 'High', 2: 'Open', 3: 'Low', 4: 'High',
5: 'Open'},
'Value': {0: 34.5, 1: 36.5, 2: 23.5, 3: 32.5, 4: 38.95, 5: 32.5}}
df = pd.DataFrame(data)
res = df[df.Metric.isin(['Low','High'])].pivot(index='Date', columns='Metric',
values='Value')
res = res.assign(Volume=res['High'].sub(res.Low)).drop(
['High', 'Low'], axis=1).reset_index(drop=False)
res.columns.name = None
print(res)
Date Volume
0 2020-01-01 2.00
1 2020-01-02 6.45
You can create 2 dataframes by filtering by low & high and join them by date. Finally, subtract columns low from high.
data=[
("2020-01-01","Low",34.5),
("2020-01-01","High",36.5),
("2020-01-01","Open",23.5),
("2020-01-02","Low",32.5),
("2020-01-02","High",38.95),
]
columns = ["Date", "Metric", "Value"]
df = pd.DataFrame(data=data, columns=columns)
df_low = df[df["Metric"]=="Low"].rename(columns={"Value": "Low"}).drop("Metric", axis=1)
df_high = df[df["Metric"]=="High"].rename(columns={"Value": "High"}).drop("Metric", axis=1)
df2 = df_low.merge(df_high, on="Date", how="inner")
df2["Volume"] = df2["High"] - df2["Low"]
[Out]:
Date Low High Volume
0 2020-01-01 34.5 36.50 2.00
1 2020-01-02 32.5 38.95 6.45