Pandas multiplication of two dataframes with uneven date rows and label columns

Question:

I have the following sample of historical Bitcoin rates in US dollars:

BTC_USD_rates = {
    "Open": {"01/01/2022": 46217.5, "02/01/2022": 47738.7, "03/01/2022": 47293.9, "04/01/2022": 46435.7, "05/01/2022": 45833.1, "06/01/2022": 43431.6, "07/01/2022": 43097.9, "08/01/2022": 41551.3},
    "Low": {"01/01/2022": 46217.5, "02/01/2022": 46718.2, "03/01/2022": 45704.0, "04/01/2022": 45602.1, "05/01/2022": 42535.1, "06/01/2022": 42481.1, "07/01/2022": 40810.0, "08/01/2022": 40574.3},
    "High": {"01/01/2022": 47917.6, "02/01/2022": 47944.9, "03/01/2022": 47556.0, "04/01/2022": 47505.4, "05/01/2022": 47019.4, "06/01/2022": 43772.3, "07/01/2022": 43127.7, "08/01/2022": 42304.4},
    "Close": {"01/01/2022": 47738.0, "02/01/2022": 47311.8, "03/01/2022": 46430.2, "04/01/2022": 45837.3, "05/01/2022": 43425.9, "06/01/2022": 43097.5, "07/01/2022": 41546.7, "08/01/2022": 41672.0},
    "Volume": {"01/01/2022": 31239, "02/01/2022": 27020, "03/01/2022": 41062, "04/01/2022": 55589, "05/01/2022": 83744, "06/01/2022": 63076, "07/01/2022": 88358, "08/01/2022": 52544},
}
df1 = pd.DataFrame.from_dict(BTC_USD_rates)
df1

            Open    Low     High    Close   Volume
01/01/2022  46217.5 46217.5 47917.6 47738.0 31239
02/01/2022  47738.7 46718.2 47944.9 47311.8 27020
03/01/2022  47293.9 45704.0 47556.0 46430.2 41062
04/01/2022  46435.7 45602.1 47505.4 45837.3 55589
05/01/2022  45833.1 42535.1 47019.4 43425.9 83744
06/01/2022  43431.6 42481.1 43772.3 43097.5 63076
07/01/2022  43097.9 40810.0 43127.7 41546.7 88358
08/01/2022  41551.3 40574.3 42304.4 41672.0 52544

And then for the same period I have the following historical New Zealand Dollars to $1 US Dollar rates:

USD_NZD_rates = {
    "Open": {"03/01/2022": 1.465, "04/01/2022": 1.4719, "06/01/2022": 1.4717, "07/01/2022": 1.4819},
    "Low": {"03/01/2022": 1.4583, "04/01/2022": 1.4651, "06/01/2022": 1.4708, "07/01/2022": 1.4733},
    "High": {"03/01/2022": 1.4763, "04/01/2022": 1.4784, "06/01/2022": 1.4854, "07/01/2022": 1.4849},
    "Close": {"03/01/2022": 1.4732, "04/01/2022": 1.4669, "06/01/2022": 1.4817, "07/01/2022": 1.4741},
}
df2 = pd.DataFrame.from_dict(USD_NZD_rates)
df2

            Open    Low     High    Close
03/01/2022  1.4650  1.4583  1.4763  1.4732
04/01/2022  1.4719  1.4651  1.4784  1.4669
06/01/2022  1.4717  1.4708  1.4854  1.4817
07/01/2022  1.4819  1.4733  1.4849  1.4741

What I need to accomplish is convert each date’s Open, Low, High and Close BTC rates to NZD’s using the USD_NZD Close rate for each respective date.

There are two caveats however, and those are preventing me to get there by just going plain vanilla like df1.multiply(df2["Close"], axis="index"):

  1. Ignore the Volume column in df1.
  2. df2 doesn’t bring the USD_NZD Close rates I need for some dates (01/01/2022, 02/01/2022, 05/01/2022 and 08/01/2022), so for such cases I need the method to make sure each of the two situations are dealt with accordingly:
    • For those initial missing dates (01/01/2022 and 02/01/2022) the FIRST available date must be used (03/01/2022) as the USD_NZD Close rate that will be used to convert all 4 BTC_USD rates.
    • In case of any missing date down the dataframe (05/01/2022 and 08/01/2022), the PREVIOUS available date must be used (04/01/2022 and 07/01/2022 respectively) as the USD_NZD Close rate that will be used to convert all 4 BTC_USD rates.

How can I get there considering all those exceptions?!

Asked By: leolapa

||

Answers:

You can try filling in the rates with bfill and ffill like this:

new_rates = df2.reindex(df1.index.union(df2.index))

# open or close?
new_rates['Open'] =new_rates['Open'].bfill()

# fill missing data with previously available data
new_rates['Close'] = new_rates['Close'].ffill()

new_rates = new_rates.bfill(axis=1).ffill(axis=1)

df1.mul(new_rates, fill_value=1)

Output:

                  Close         High          Low         Open   Volume
01/01/2022  69936.17000  70199.28400  67708.63750  67708.63750  31239.0
02/01/2022  69311.78700  70239.27850  68442.16300  69937.19550  27020.0
03/01/2022  68400.97064  70206.92280  66650.14320  69285.56350  41062.0
04/01/2022  67238.73537  70231.98336  66811.63671  68348.70683  55589.0
05/01/2022  63701.45271  68972.75786  62394.73819  67452.57327  83744.0
06/01/2022  63857.56575  65019.37442  62481.20188  63918.28572  63076.0
07/01/2022  61243.99047  64040.32173  60125.37300  63866.77801  88358.0
08/01/2022  61428.69520  62360.91604  59810.57563  61250.77133  52544.0
Answered By: Quang Hoang

Based on @Quang Hoang first stab at it I managed to get to the answer I needed:

First, following his proposed method I expanded the df2 dataframe by adding those missing dates that show up under df1:

df2 = df2.reindex(df1.index.union(df2.index))

            Open    Low     High    Close
01/01/2022  NaN     NaN     NaN     NaN
02/01/2022  NaN     NaN     NaN     NaN
03/01/2022  1.4650  1.4583  1.4763  1.4732
04/01/2022  1.4719  1.4651  1.4784  1.4669
05/01/2022  NaN     NaN     NaN     NaN
06/01/2022  1.4717  1.4708  1.4854  1.4817
07/01/2022  1.4819  1.4733  1.4849  1.4741
08/01/2022  NaN     NaN     NaN     NaN

Then, also piggybacking on his method, but with some necessary tweaks, I managed to backward fill the initial missing dates with the rates from the first available date, and then forward fill the rest of the missing dates with rates from their respective prior available dates:

df2 = df2.bfill()
df2 = df2.ffill()

            Open    Low     High    Close
01/01/2022  1.4650  1.4583  1.4763  1.4732
02/01/2022  1.4650  1.4583  1.4763  1.4732
03/01/2022  1.4650  1.4583  1.4763  1.4732
04/01/2022  1.4719  1.4651  1.4784  1.4669
05/01/2022  1.4719  1.4651  1.4784  1.4669
06/01/2022  1.4717  1.4708  1.4854  1.4817
07/01/2022  1.4819  1.4733  1.4849  1.4741
08/01/2022  1.4819  1.4733  1.4849  1.4741

Since I need to convert all df1 rates (Open, Low, High and Close) by the df2 Close rates only, I ran these additional lines to apply the Close rates column to the other columns, and also created a Volume column with all 1 values since this column should not be converted – I’m pretty sure there might be a way to do it all in one line of code, but that’s the only way I know how to do it:

df2['Open'] = df2['Close']
df2['Low'] = df2['Close']
df2['High'] = df2['Close']
df2['Volume'] = 1

            Open    Low     High    Close   Volume
01/01/2022  1.4732  1.4732  1.4732  1.4732  1
02/01/2022  1.4732  1.4732  1.4732  1.4732  1
03/01/2022  1.4732  1.4732  1.4732  1.4732  1
04/01/2022  1.4669  1.4669  1.4669  1.4669  1
05/01/2022  1.4669  1.4669  1.4669  1.4669  1
06/01/2022  1.4817  1.4817  1.4817  1.4817  1
07/01/2022  1.4741  1.4741  1.4741  1.4741  1
08/01/2022  1.4741  1.4741  1.4741  1.4741  1

And finally I run the multiplication of df1 to df2 to get to the sought for result:

df1 = df1.mul(df2, axis = 0)

            Open        Low         High        Close       Volume
01/01/2022  68087.62100 68087.62100 70592.20832 70327.62160 31239
02/01/2022  70328.65284 68825.25224 70632.42668 69699.74376 27020
03/01/2022  69673.37348 67331.13280 70059.49920 68400.97064 41062
04/01/2022  68116.52833 66893.72049 69685.67126 67238.73537 55589
05/01/2022  67232.57439 62394.73819 68972.75786 63701.45271 83744
06/01/2022  64352.60172 62944.24587 64857.41691 63857.56575 63076
07/01/2022  63530.61439 60158.02100 63574.54257 61243.99047 88358
08/01/2022  61250.77133 59810.57563 62360.91604 61428.69520 52544
Answered By: leolapa