combining two csv files out of n-numbers of files and generate new csv files in a folder using pandas

Question

I have 71 nos of csv files. I need to merge 2 files and create a new files. That means total numbers of file will be generated nCr = 2485 but in my code it’s generating nPr = 4970 nos of files. If I have two files in my folder named a.csv and b.csv then it is generating both a&b.csv and b&a.csv…How can I avoid that error?? below is my code…..

for i in range(0,len(name)):
    file1 = name["file_names"][i]
    for j in range(1,len(name)):
        file2 = name["file_names"][j]
        df1 = pd.read_csv(path+file1+".csv")
        df2 = pd.read_csv(path+file2+".csv")
        df = df1.append(df2)
        df = df.sort_values(by=['Entry Time'], ascending=True)
        df = df.reset_index(drop=True)
        new_file = file1 + " & " + file2
        df.to_csv(destination_path + new_file+".csv")
        print(new_file)
        i = i + 1
        j = j + 1

Asked By: Arunava Datta

||

Source

Answer 1

You’re looking to make combinations of files:

from itertools import combinations

# if `name['file_names']` isn't unique, and is a pandas column, use `name['file_names'].unique()` instead.
for file1, file2 in combinations(name['file_names'], 2):
    df1 = pd.read_csv(path+file1+".csv")
    df2 = pd.read_csv(path+file2+".csv")
    df = pd.concat([df1, df2]) # Use concat, append is deprecated.
    # We can avoid `reset_index` by adding `ignore_index=True`:
    df = df.sort_values(by=['Entry Time'], ascending=True, ignore_index=True)
    new_file = file1 + " & " + file2
    df.to_csv(destination_path + new_file + ".csv")
    print(new_file)

>>> len(list(combinations(range(71), 2)))
2485

Answered By: BeRT2me

combining two csv files out of n-numbers of files and generate new csv files in a folder using pandas

Question:

Answers: