Pandas multiplication of two dataframes with uneven date rows and label columns
Question:
I have the following sample of historical Bitcoin rates in US dollars:
BTC_USD_rates = {
"Open": {"01/01/2022": 46217.5, "02/01/2022": 47738.7, "03/01/2022": 47293.9, "04/01/2022": 46435.7, "05/01/2022": 45833.1, "06/01/2022": 43431.6, "07/01/2022": 43097.9, "08/01/2022": 41551.3},
"Low": {"01/01/2022": 46217.5, "02/01/2022": 46718.2, "03/01/2022": 45704.0, "04/01/2022": 45602.1, "05/01/2022": 42535.1, "06/01/2022": 42481.1, "07/01/2022": 40810.0, "08/01/2022": 40574.3},
"High": {"01/01/2022": 47917.6, "02/01/2022": 47944.9, "03/01/2022": 47556.0, "04/01/2022": 47505.4, "05/01/2022": 47019.4, "06/01/2022": 43772.3, "07/01/2022": 43127.7, "08/01/2022": 42304.4},
"Close": {"01/01/2022": 47738.0, "02/01/2022": 47311.8, "03/01/2022": 46430.2, "04/01/2022": 45837.3, "05/01/2022": 43425.9, "06/01/2022": 43097.5, "07/01/2022": 41546.7, "08/01/2022": 41672.0},
"Volume": {"01/01/2022": 31239, "02/01/2022": 27020, "03/01/2022": 41062, "04/01/2022": 55589, "05/01/2022": 83744, "06/01/2022": 63076, "07/01/2022": 88358, "08/01/2022": 52544},
}
df1 = pd.DataFrame.from_dict(BTC_USD_rates)
df1
Open Low High Close Volume
01/01/2022 46217.5 46217.5 47917.6 47738.0 31239
02/01/2022 47738.7 46718.2 47944.9 47311.8 27020
03/01/2022 47293.9 45704.0 47556.0 46430.2 41062
04/01/2022 46435.7 45602.1 47505.4 45837.3 55589
05/01/2022 45833.1 42535.1 47019.4 43425.9 83744
06/01/2022 43431.6 42481.1 43772.3 43097.5 63076
07/01/2022 43097.9 40810.0 43127.7 41546.7 88358
08/01/2022 41551.3 40574.3 42304.4 41672.0 52544
And then for the same period I have the following historical New Zealand Dollars to $1 US Dollar rates:
USD_NZD_rates = {
"Open": {"03/01/2022": 1.465, "04/01/2022": 1.4719, "06/01/2022": 1.4717, "07/01/2022": 1.4819},
"Low": {"03/01/2022": 1.4583, "04/01/2022": 1.4651, "06/01/2022": 1.4708, "07/01/2022": 1.4733},
"High": {"03/01/2022": 1.4763, "04/01/2022": 1.4784, "06/01/2022": 1.4854, "07/01/2022": 1.4849},
"Close": {"03/01/2022": 1.4732, "04/01/2022": 1.4669, "06/01/2022": 1.4817, "07/01/2022": 1.4741},
}
df2 = pd.DataFrame.from_dict(USD_NZD_rates)
df2
Open Low High Close
03/01/2022 1.4650 1.4583 1.4763 1.4732
04/01/2022 1.4719 1.4651 1.4784 1.4669
06/01/2022 1.4717 1.4708 1.4854 1.4817
07/01/2022 1.4819 1.4733 1.4849 1.4741
What I need to accomplish is convert each date’s Open
, Low
, High
and Close
BTC rates to NZD’s using the USD_NZD Close
rate for each respective date.
There are two caveats however, and those are preventing me to get there by just going plain vanilla like df1.multiply(df2["Close"], axis="index")
:
- Ignore the
Volume
column in df1
.
df2
doesn’t bring the USD_NZD Close
rates I need for some dates (01/01/2022, 02/01/2022, 05/01/2022 and 08/01/2022), so for such cases I need the method to make sure each of the two situations are dealt with accordingly:
- For those initial missing dates (01/01/2022 and 02/01/2022) the FIRST available date must be used (03/01/2022) as the USD_NZD
Close
rate that will be used to convert all 4 BTC_USD rates.
- In case of any missing date down the dataframe (05/01/2022 and 08/01/2022), the PREVIOUS available date must be used (04/01/2022 and 07/01/2022 respectively) as the USD_NZD
Close
rate that will be used to convert all 4 BTC_USD rates.
How can I get there considering all those exceptions?!
Answers:
You can try filling in the rates with bfill
and ffill
like this:
new_rates = df2.reindex(df1.index.union(df2.index))
# open or close?
new_rates['Open'] =new_rates['Open'].bfill()
# fill missing data with previously available data
new_rates['Close'] = new_rates['Close'].ffill()
new_rates = new_rates.bfill(axis=1).ffill(axis=1)
df1.mul(new_rates, fill_value=1)
Output:
Close High Low Open Volume
01/01/2022 69936.17000 70199.28400 67708.63750 67708.63750 31239.0
02/01/2022 69311.78700 70239.27850 68442.16300 69937.19550 27020.0
03/01/2022 68400.97064 70206.92280 66650.14320 69285.56350 41062.0
04/01/2022 67238.73537 70231.98336 66811.63671 68348.70683 55589.0
05/01/2022 63701.45271 68972.75786 62394.73819 67452.57327 83744.0
06/01/2022 63857.56575 65019.37442 62481.20188 63918.28572 63076.0
07/01/2022 61243.99047 64040.32173 60125.37300 63866.77801 88358.0
08/01/2022 61428.69520 62360.91604 59810.57563 61250.77133 52544.0
Based on @Quang Hoang first stab at it I managed to get to the answer I needed:
First, following his proposed method I expanded the df2
dataframe by adding those missing dates that show up under df1
:
df2 = df2.reindex(df1.index.union(df2.index))
Open Low High Close
01/01/2022 NaN NaN NaN NaN
02/01/2022 NaN NaN NaN NaN
03/01/2022 1.4650 1.4583 1.4763 1.4732
04/01/2022 1.4719 1.4651 1.4784 1.4669
05/01/2022 NaN NaN NaN NaN
06/01/2022 1.4717 1.4708 1.4854 1.4817
07/01/2022 1.4819 1.4733 1.4849 1.4741
08/01/2022 NaN NaN NaN NaN
Then, also piggybacking on his method, but with some necessary tweaks, I managed to backward fill the initial missing dates with the rates from the first available date, and then forward fill the rest of the missing dates with rates from their respective prior available dates:
df2 = df2.bfill()
df2 = df2.ffill()
Open Low High Close
01/01/2022 1.4650 1.4583 1.4763 1.4732
02/01/2022 1.4650 1.4583 1.4763 1.4732
03/01/2022 1.4650 1.4583 1.4763 1.4732
04/01/2022 1.4719 1.4651 1.4784 1.4669
05/01/2022 1.4719 1.4651 1.4784 1.4669
06/01/2022 1.4717 1.4708 1.4854 1.4817
07/01/2022 1.4819 1.4733 1.4849 1.4741
08/01/2022 1.4819 1.4733 1.4849 1.4741
Since I need to convert all df1
rates (Open
, Low
, High
and Close
) by the df2
Close
rates only, I ran these additional lines to apply the Close
rates column to the other columns, and also created a Volume
column with all 1
values since this column should not be converted – I’m pretty sure there might be a way to do it all in one line of code, but that’s the only way I know how to do it:
df2['Open'] = df2['Close']
df2['Low'] = df2['Close']
df2['High'] = df2['Close']
df2['Volume'] = 1
Open Low High Close Volume
01/01/2022 1.4732 1.4732 1.4732 1.4732 1
02/01/2022 1.4732 1.4732 1.4732 1.4732 1
03/01/2022 1.4732 1.4732 1.4732 1.4732 1
04/01/2022 1.4669 1.4669 1.4669 1.4669 1
05/01/2022 1.4669 1.4669 1.4669 1.4669 1
06/01/2022 1.4817 1.4817 1.4817 1.4817 1
07/01/2022 1.4741 1.4741 1.4741 1.4741 1
08/01/2022 1.4741 1.4741 1.4741 1.4741 1
And finally I run the multiplication of df1
to df2
to get to the sought for result:
df1 = df1.mul(df2, axis = 0)
Open Low High Close Volume
01/01/2022 68087.62100 68087.62100 70592.20832 70327.62160 31239
02/01/2022 70328.65284 68825.25224 70632.42668 69699.74376 27020
03/01/2022 69673.37348 67331.13280 70059.49920 68400.97064 41062
04/01/2022 68116.52833 66893.72049 69685.67126 67238.73537 55589
05/01/2022 67232.57439 62394.73819 68972.75786 63701.45271 83744
06/01/2022 64352.60172 62944.24587 64857.41691 63857.56575 63076
07/01/2022 63530.61439 60158.02100 63574.54257 61243.99047 88358
08/01/2022 61250.77133 59810.57563 62360.91604 61428.69520 52544
I have the following sample of historical Bitcoin rates in US dollars:
BTC_USD_rates = {
"Open": {"01/01/2022": 46217.5, "02/01/2022": 47738.7, "03/01/2022": 47293.9, "04/01/2022": 46435.7, "05/01/2022": 45833.1, "06/01/2022": 43431.6, "07/01/2022": 43097.9, "08/01/2022": 41551.3},
"Low": {"01/01/2022": 46217.5, "02/01/2022": 46718.2, "03/01/2022": 45704.0, "04/01/2022": 45602.1, "05/01/2022": 42535.1, "06/01/2022": 42481.1, "07/01/2022": 40810.0, "08/01/2022": 40574.3},
"High": {"01/01/2022": 47917.6, "02/01/2022": 47944.9, "03/01/2022": 47556.0, "04/01/2022": 47505.4, "05/01/2022": 47019.4, "06/01/2022": 43772.3, "07/01/2022": 43127.7, "08/01/2022": 42304.4},
"Close": {"01/01/2022": 47738.0, "02/01/2022": 47311.8, "03/01/2022": 46430.2, "04/01/2022": 45837.3, "05/01/2022": 43425.9, "06/01/2022": 43097.5, "07/01/2022": 41546.7, "08/01/2022": 41672.0},
"Volume": {"01/01/2022": 31239, "02/01/2022": 27020, "03/01/2022": 41062, "04/01/2022": 55589, "05/01/2022": 83744, "06/01/2022": 63076, "07/01/2022": 88358, "08/01/2022": 52544},
}
df1 = pd.DataFrame.from_dict(BTC_USD_rates)
df1
Open Low High Close Volume
01/01/2022 46217.5 46217.5 47917.6 47738.0 31239
02/01/2022 47738.7 46718.2 47944.9 47311.8 27020
03/01/2022 47293.9 45704.0 47556.0 46430.2 41062
04/01/2022 46435.7 45602.1 47505.4 45837.3 55589
05/01/2022 45833.1 42535.1 47019.4 43425.9 83744
06/01/2022 43431.6 42481.1 43772.3 43097.5 63076
07/01/2022 43097.9 40810.0 43127.7 41546.7 88358
08/01/2022 41551.3 40574.3 42304.4 41672.0 52544
And then for the same period I have the following historical New Zealand Dollars to $1 US Dollar rates:
USD_NZD_rates = {
"Open": {"03/01/2022": 1.465, "04/01/2022": 1.4719, "06/01/2022": 1.4717, "07/01/2022": 1.4819},
"Low": {"03/01/2022": 1.4583, "04/01/2022": 1.4651, "06/01/2022": 1.4708, "07/01/2022": 1.4733},
"High": {"03/01/2022": 1.4763, "04/01/2022": 1.4784, "06/01/2022": 1.4854, "07/01/2022": 1.4849},
"Close": {"03/01/2022": 1.4732, "04/01/2022": 1.4669, "06/01/2022": 1.4817, "07/01/2022": 1.4741},
}
df2 = pd.DataFrame.from_dict(USD_NZD_rates)
df2
Open Low High Close
03/01/2022 1.4650 1.4583 1.4763 1.4732
04/01/2022 1.4719 1.4651 1.4784 1.4669
06/01/2022 1.4717 1.4708 1.4854 1.4817
07/01/2022 1.4819 1.4733 1.4849 1.4741
What I need to accomplish is convert each date’s Open
, Low
, High
and Close
BTC rates to NZD’s using the USD_NZD Close
rate for each respective date.
There are two caveats however, and those are preventing me to get there by just going plain vanilla like df1.multiply(df2["Close"], axis="index")
:
- Ignore the
Volume
column indf1
. df2
doesn’t bring the USD_NZDClose
rates I need for some dates (01/01/2022, 02/01/2022, 05/01/2022 and 08/01/2022), so for such cases I need the method to make sure each of the two situations are dealt with accordingly:- For those initial missing dates (01/01/2022 and 02/01/2022) the FIRST available date must be used (03/01/2022) as the USD_NZD
Close
rate that will be used to convert all 4 BTC_USD rates. - In case of any missing date down the dataframe (05/01/2022 and 08/01/2022), the PREVIOUS available date must be used (04/01/2022 and 07/01/2022 respectively) as the USD_NZD
Close
rate that will be used to convert all 4 BTC_USD rates.
- For those initial missing dates (01/01/2022 and 02/01/2022) the FIRST available date must be used (03/01/2022) as the USD_NZD
How can I get there considering all those exceptions?!
You can try filling in the rates with bfill
and ffill
like this:
new_rates = df2.reindex(df1.index.union(df2.index))
# open or close?
new_rates['Open'] =new_rates['Open'].bfill()
# fill missing data with previously available data
new_rates['Close'] = new_rates['Close'].ffill()
new_rates = new_rates.bfill(axis=1).ffill(axis=1)
df1.mul(new_rates, fill_value=1)
Output:
Close High Low Open Volume
01/01/2022 69936.17000 70199.28400 67708.63750 67708.63750 31239.0
02/01/2022 69311.78700 70239.27850 68442.16300 69937.19550 27020.0
03/01/2022 68400.97064 70206.92280 66650.14320 69285.56350 41062.0
04/01/2022 67238.73537 70231.98336 66811.63671 68348.70683 55589.0
05/01/2022 63701.45271 68972.75786 62394.73819 67452.57327 83744.0
06/01/2022 63857.56575 65019.37442 62481.20188 63918.28572 63076.0
07/01/2022 61243.99047 64040.32173 60125.37300 63866.77801 88358.0
08/01/2022 61428.69520 62360.91604 59810.57563 61250.77133 52544.0
Based on @Quang Hoang first stab at it I managed to get to the answer I needed:
First, following his proposed method I expanded the df2
dataframe by adding those missing dates that show up under df1
:
df2 = df2.reindex(df1.index.union(df2.index))
Open Low High Close
01/01/2022 NaN NaN NaN NaN
02/01/2022 NaN NaN NaN NaN
03/01/2022 1.4650 1.4583 1.4763 1.4732
04/01/2022 1.4719 1.4651 1.4784 1.4669
05/01/2022 NaN NaN NaN NaN
06/01/2022 1.4717 1.4708 1.4854 1.4817
07/01/2022 1.4819 1.4733 1.4849 1.4741
08/01/2022 NaN NaN NaN NaN
Then, also piggybacking on his method, but with some necessary tweaks, I managed to backward fill the initial missing dates with the rates from the first available date, and then forward fill the rest of the missing dates with rates from their respective prior available dates:
df2 = df2.bfill()
df2 = df2.ffill()
Open Low High Close
01/01/2022 1.4650 1.4583 1.4763 1.4732
02/01/2022 1.4650 1.4583 1.4763 1.4732
03/01/2022 1.4650 1.4583 1.4763 1.4732
04/01/2022 1.4719 1.4651 1.4784 1.4669
05/01/2022 1.4719 1.4651 1.4784 1.4669
06/01/2022 1.4717 1.4708 1.4854 1.4817
07/01/2022 1.4819 1.4733 1.4849 1.4741
08/01/2022 1.4819 1.4733 1.4849 1.4741
Since I need to convert all df1
rates (Open
, Low
, High
and Close
) by the df2
Close
rates only, I ran these additional lines to apply the Close
rates column to the other columns, and also created a Volume
column with all 1
values since this column should not be converted – I’m pretty sure there might be a way to do it all in one line of code, but that’s the only way I know how to do it:
df2['Open'] = df2['Close']
df2['Low'] = df2['Close']
df2['High'] = df2['Close']
df2['Volume'] = 1
Open Low High Close Volume
01/01/2022 1.4732 1.4732 1.4732 1.4732 1
02/01/2022 1.4732 1.4732 1.4732 1.4732 1
03/01/2022 1.4732 1.4732 1.4732 1.4732 1
04/01/2022 1.4669 1.4669 1.4669 1.4669 1
05/01/2022 1.4669 1.4669 1.4669 1.4669 1
06/01/2022 1.4817 1.4817 1.4817 1.4817 1
07/01/2022 1.4741 1.4741 1.4741 1.4741 1
08/01/2022 1.4741 1.4741 1.4741 1.4741 1
And finally I run the multiplication of df1
to df2
to get to the sought for result:
df1 = df1.mul(df2, axis = 0)
Open Low High Close Volume
01/01/2022 68087.62100 68087.62100 70592.20832 70327.62160 31239
02/01/2022 70328.65284 68825.25224 70632.42668 69699.74376 27020
03/01/2022 69673.37348 67331.13280 70059.49920 68400.97064 41062
04/01/2022 68116.52833 66893.72049 69685.67126 67238.73537 55589
05/01/2022 67232.57439 62394.73819 68972.75786 63701.45271 83744
06/01/2022 64352.60172 62944.24587 64857.41691 63857.56575 63076
07/01/2022 63530.61439 60158.02100 63574.54257 61243.99047 88358
08/01/2022 61250.77133 59810.57563 62360.91604 61428.69520 52544