combining two csv files out of n-numbers of files and generate new csv files in a folder using pandas
Question:
I have 71 nos of csv files. I need to merge 2 files and create a new files. That means total numbers of file will be generated nCr = 2485 but in my code it’s generating nPr = 4970 nos of files. If I have two files in my folder named a.csv and b.csv then it is generating both a&b.csv and b&a.csv…How can I avoid that error?? below is my code…..
for i in range(0,len(name)):
file1 = name["file_names"][i]
for j in range(1,len(name)):
file2 = name["file_names"][j]
df1 = pd.read_csv(path+file1+".csv")
df2 = pd.read_csv(path+file2+".csv")
df = df1.append(df2)
df = df.sort_values(by=['Entry Time'], ascending=True)
df = df.reset_index(drop=True)
new_file = file1 + " & " + file2
df.to_csv(destination_path + new_file+".csv")
print(new_file)
i = i + 1
j = j + 1
Answers:
You’re looking to make combinations of files:
from itertools import combinations
# if `name['file_names']` isn't unique, and is a pandas column, use `name['file_names'].unique()` instead.
for file1, file2 in combinations(name['file_names'], 2):
df1 = pd.read_csv(path+file1+".csv")
df2 = pd.read_csv(path+file2+".csv")
df = pd.concat([df1, df2]) # Use concat, append is deprecated.
# We can avoid `reset_index` by adding `ignore_index=True`:
df = df.sort_values(by=['Entry Time'], ascending=True, ignore_index=True)
new_file = file1 + " & " + file2
df.to_csv(destination_path + new_file + ".csv")
print(new_file)
>>> len(list(combinations(range(71), 2)))
2485
I have 71 nos of csv files. I need to merge 2 files and create a new files. That means total numbers of file will be generated nCr = 2485 but in my code it’s generating nPr = 4970 nos of files. If I have two files in my folder named a.csv and b.csv then it is generating both a&b.csv and b&a.csv…How can I avoid that error?? below is my code…..
for i in range(0,len(name)):
file1 = name["file_names"][i]
for j in range(1,len(name)):
file2 = name["file_names"][j]
df1 = pd.read_csv(path+file1+".csv")
df2 = pd.read_csv(path+file2+".csv")
df = df1.append(df2)
df = df.sort_values(by=['Entry Time'], ascending=True)
df = df.reset_index(drop=True)
new_file = file1 + " & " + file2
df.to_csv(destination_path + new_file+".csv")
print(new_file)
i = i + 1
j = j + 1
You’re looking to make combinations of files:
from itertools import combinations
# if `name['file_names']` isn't unique, and is a pandas column, use `name['file_names'].unique()` instead.
for file1, file2 in combinations(name['file_names'], 2):
df1 = pd.read_csv(path+file1+".csv")
df2 = pd.read_csv(path+file2+".csv")
df = pd.concat([df1, df2]) # Use concat, append is deprecated.
# We can avoid `reset_index` by adding `ignore_index=True`:
df = df.sort_values(by=['Entry Time'], ascending=True, ignore_index=True)
new_file = file1 + " & " + file2
df.to_csv(destination_path + new_file + ".csv")
print(new_file)
>>> len(list(combinations(range(71), 2)))
2485