How do I merge multiple cvs files into one DataFrame in pandas?

Question:

I have a folder with a lot of csv-files each containing messurements of signal data. They have the following structure:

Frequency [kHz],Power [dbm]
852000,-135.812845793404
852008,-142.13849097071088
852016,-138.21218081816156
852024,-137.32593610384734
852032,-139.464539680863

I want to merge these files into a DataFrame with Frequency as the key column, because the frequency is the same in every file. So it should look something like this in the DataFrame:

Frequency [kHz] | Power [dbm] | Power [dbm] | Power [dbm] | ...

So I wrote the following code:

df = pd.DataFrame()
for f in csv_files:
    csv = pd.read_csv(f)
    df = pd.merge(df, csv, on='Frequency [kHz]', sort=False)

But the only thing I get is an KeyError: 'Frequency [kHz]'

The closest I came to my desired result was through pd.concat([pd.read_csv(f) for f in csv_files], axis=0, sort=False) but then there are still those Frequency columns in between.

Asked By: Zombiefleischer

||

Answers:

You can read them all into a dictionary and use concat:

import pandas as pd
import glob

path = 'path' 
all_files = glob.glob(path + "/*.csv")

df_dict1 = {}

for filename in all_files:
    df = pd.read_csv(filename)
    df_dict1.update({f'{filename}':df})

df = pd.concat(df_dict1, axis =1)
df = df.droplevel(0, axis =1)
df.index = df['Frequency [kHz]']
df.drop(columns = 'Frequency [kHz]', inplace = True)
Answered By: DHJ

I think you can collect them all as dfs, and then merge, like so:

import pandas as pd
from functools import reduce

data_frames = []
for f in csv_files:
    df = pd.read_csv(f)
    data_frames.append(df)

df_merged = reduce(lambda left, right: pd.merge(left, right, on=['Frequency [kHz]'],
                                            how='outer'), data_frames)
Answered By: Jasmin Heifa
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.