Merge multiple (2+) dataframes in Pandas with a proper column name

Question:

everyone. I have the following list of dataframes:

df_eurusd = DownloadData('EUR/USD',start_date,end_date,timeframe).GetData()
df_usdjpy = DownloadData('USD/JPY',start_date,end_date,timeframe).GetData()
df_gbpusd = DownloadData('GBP/USD',start_date,end_date,timeframe).GetData()
df_usdcad = DownloadData('USD/CAD',start_date,end_date,timeframe).GetData()
df_usdsek = DownloadData('USD/SEK',start_date,end_date,timeframe).GetData()
df_usdchf = DownloadData('USD/CHF',start_date,end_date,timeframe).GetData()

tickers = {
    'EUR/USD' : df_eurusd,
    'USD/JPY' : df_usdjpy,
    'GBP/USD' : df_gbpusd,
    'USD/CAD' : df_usdcad,
    'USD/SEK' : df_usdsek,
    'USD/CHF' : df_usdchf
}

And each dataframe have the following format:

                     Timestamp     Open     High      Low    Close
0    2023-01-10 00:00:00+00:00  1.07324  1.07381  1.07205  1.07273
1    2023-01-10 01:00:00+00:00  1.07273  1.07396  1.07259  1.07360
2    2023-01-10 02:00:00+00:00  1.07360  1.07424  1.07212  1.07308
3    2023-01-10 03:00:00+00:00  1.07309  1.07341  1.07255  1.07321
4    2023-01-10 04:00:00+00:00  1.07323  1.07339  1.07278  1.07306
...                        ...      ...      ...      ...      ...
5729 2023-12-08 17:00:00+00:00  1.07448  1.07575  1.07402  1.07560
5730 2023-12-08 18:00:00+00:00  1.07560  1.07623  1.07528  1.07571
5731 2023-12-08 19:00:00+00:00  1.07572  1.07616  1.07565  1.07587
5732 2023-12-08 20:00:00+00:00  1.07587  1.07664  1.07569  1.07636
5733 2023-12-08 21:00:00+00:00  1.07636  1.07642  1.07578  1.07592

What I need to do is to obtain something like this:

                     Timestamp  EUR/USD  USD/JPY  GBP/USD  USD/CAD  ...
0    2023-01-10 00:00:00+00:00  1.07324  1.07381  1.07205  1.07273
1    2023-01-10 01:00:00+00:00  1.07273  1.07396  1.07259  1.07360
2    2023-01-10 02:00:00+00:00  1.07360  1.07424  1.07212  1.07308
...                        ...      ...      ...      ...      ...

Basically extract the ‘Close’ column for each dataframe and insert it on a new dataframe which collect all ‘Close’ columns indexed by the same date (‘Timestamp’).

I tried a lot of things that other people suggested here but I’m not able to achieve it. An important part would be not having to manually type the name of each column of the final df because, of course, I might use up to 100 dataframes and I can’t manually type each of them.

Thank’s.

Issues with proposed solution

I used the following solution:

tickers = ['EUR/USD','USD/JPY','GBP/USD','USD/CAD','USD/SEK','USD/CHF']#,'DOLLAR.IDX/USD']

dfs_close = []
for ticker in tickers:
  df = DownloadData(ticker, start_date,end_date,timeframe).getData()
  df = df[['Timestamp', 'Close']].set_index('Timestamp').copy()
  dfs_close.append(df)
  print(ticker,'n ---')
  print(df)

merged_df = pd.concat(dfs_close, axis=1, keys=tickers)
merged_df.columns  = merged_df.columns.droplevel(1)


print(merged_df)

As you may have notices I inserted a print statement inside the loop in order to see if data is correctly donwloaded and another print of the final result. Here’s all the printed values:

EUR/USD 
 ---
                             Close
Timestamp                         
2023-01-10 00:00:00+00:00  1.07273
2023-01-10 01:00:00+00:00  1.07360
2023-01-10 02:00:00+00:00  1.07308
2023-01-10 03:00:00+00:00  1.07321
2023-01-10 04:00:00+00:00  1.07306
...                            ...
2023-12-08 17:00:00+00:00  1.07560
2023-12-08 18:00:00+00:00  1.07571
2023-12-08 19:00:00+00:00  1.07587
2023-12-08 20:00:00+00:00  1.07636
2023-12-08 21:00:00+00:00  1.07592

[5734 rows x 1 columns]
USD/JPY 
 ---
                               Close
Timestamp                           
2023-01-10 00:00:00+00:00    1.07273
2023-01-10 01:00:00+00:00    1.07360
2023-01-10 02:00:00+00:00    1.07308
2023-01-10 03:00:00+00:00    1.07321
2023-01-10 04:00:00+00:00    1.07306
...                              ...
2023-12-08 17:00:00+00:00  144.80500
2023-12-08 18:00:00+00:00  144.93400
2023-12-08 19:00:00+00:00  144.96400
2023-12-08 20:00:00+00:00  144.98700
2023-12-08 21:00:00+00:00  144.90200

[5734 rows x 1 columns]
GBP/USD 
 ---
                             Close
Timestamp                         
2023-01-10 00:00:00+00:00  1.21738
2023-01-10 01:00:00+00:00  1.21796
2023-01-10 02:00:00+00:00  1.21685
2023-01-10 03:00:00+00:00  1.21675
2023-01-10 04:00:00+00:00  1.21595
...                            ...
2023-12-08 17:00:00+00:00  1.25452
2023-12-08 18:00:00+00:00  1.25508
2023-12-08 19:00:00+00:00  1.25498
2023-12-08 20:00:00+00:00  1.25475
2023-12-08 21:00:00+00:00  1.25445

[5734 rows x 1 columns]
USD/CAD 
 ---
                             Close
Timestamp                         
2023-01-10 00:00:00+00:00  1.33937
2023-01-10 01:00:00+00:00  1.33780
2023-01-10 02:00:00+00:00  1.33939
2023-01-10 03:00:00+00:00  1.33889
2023-01-10 04:00:00+00:00  1.33923
...                            ...
2023-12-08 17:00:00+00:00  1.35982
2023-12-08 18:00:00+00:00  1.35900
2023-12-08 19:00:00+00:00  1.35836
2023-12-08 20:00:00+00:00  1.35870
2023-12-08 21:00:00+00:00  1.35770

[5733 rows x 1 columns]
USD/SEK 
 ---
                              Close
Timestamp                          
2023-01-10 00:00:00+00:00  10.41087
2023-01-10 01:00:00+00:00  10.39731
2023-01-10 02:00:00+00:00  10.40288
2023-01-10 03:00:00+00:00  10.40570
2023-01-10 04:00:00+00:00  10.40753
...                             ...
2023-12-08 17:00:00+00:00  10.47489
2023-12-08 18:00:00+00:00  10.47439
2023-12-08 19:00:00+00:00  10.47332
2023-12-08 20:00:00+00:00  10.46797
2023-12-08 21:00:00+00:00  10.45791

[5724 rows x 1 columns]
USD/CHF 
 ---
                             Close
Timestamp                         
2023-01-10 00:00:00+00:00  1.07273
2023-01-10 01:00:00+00:00  1.07360
2023-01-10 02:00:00+00:00  1.07308
2023-01-10 03:00:00+00:00  1.07321
2023-01-10 04:00:00+00:00  1.07306
...                            ...
2023-12-08 17:00:00+00:00  0.88063
2023-12-08 18:00:00+00:00  0.88077
2023-12-08 19:00:00+00:00  0.88033
2023-12-08 20:00:00+00:00  0.87974
2023-12-08 21:00:00+00:00  0.87965

[5734 rows x 1 columns]
                           EUR/USD    USD/JPY  GBP/USD  USD/CAD   USD/SEK  USD/CHF
Timestamp                                                                         
2023-01-10 00:00:00+00:00  1.07273    1.07273  1.21738  1.33937  10.41087  1.07273
2023-01-10 01:00:00+00:00  1.07360    1.07360  1.21796  1.33780  10.39731  1.07360
2023-01-10 02:00:00+00:00  1.07308    1.07308  1.21685  1.33939  10.40288  1.07308
2023-01-10 03:00:00+00:00  1.07321    1.07321  1.21675  1.33889  10.40570  1.07321
2023-01-10 04:00:00+00:00  1.07306    1.07306  1.21595  1.33923  10.40753  1.07306
...                            ...        ...      ...      ...       ...      ...
2023-12-08 17:00:00+00:00  1.07560  144.80500  1.25452  1.35982  10.47489  0.88063
2023-12-08 18:00:00+00:00  1.07571  144.93400  1.25508  1.35900  10.47439  0.88077
2023-12-08 19:00:00+00:00  1.07587  144.96400  1.25498  1.35836  10.47332  0.88033
2023-12-08 20:00:00+00:00  1.07636  144.98700  1.25475  1.35870  10.46797  0.87974
2023-12-08 21:00:00+00:00  1.07592  144.90200  1.25445  1.35770  10.45791  0.87965

As you see there’s a problem with the data (i.e. first values of EUR/USD = USD/JPY). I thought it was a bug with the API and I tried using a time.sleep between each request but that did not solved the issue. I think the issue is related to pandas. Does anyone know why this happen?

Asked By: Edoardo

||

Answers:

You can use concat and a dictionary comprehension:

out = pd.concat({k: x.set_index('Timestamp')['Close']
                 for k, x in tickers.items()},
                axis=1).reset_index()

This sets The Timestamp as index for each dataframe, slices only the needed column (Close) and passes it to concat. Concat uses the key of the dictionary as column name and the value as column value. The data is aligned on the same Timestamp, which is set back as column in the end with reset_index

Answered By: mozway

Let say you want to get EUR/USDand USD/JPY :

  • Create list of the targets : tickers = ['EUR/USD', 'USD/JPY']
  • For each element in tickers:
    • Get data using DownloadData
    • Extract the Timestamp and Close
    • Add data to the list dfs_close
  • Use pd.concat to concatenate the dfs. Use keys argument to create a multi-level column with the tickers
  • Remove the Close column level

Here is the full code :

tickers = ['EUR/USD', 'USD/JPY']

dfs_close = []

for ticker in tickers:
  df = DownloadData(ticker, start_date,end_date,timeframe).GetData()
  df = df[['Timestamp', 'Close']].set_index('Timestamp').copy()
  dfs_close.append(df)

merged_df = pd.concat(dfs_close, axis=1, keys=tickers)

merged_df.columns  = merged_df.columns.droplevel(1)
Answered By: Hamza NABIL