Import multiple excel files into python pandas and concatenate them into one dataframe

Question:

I would like to read several excel files from a directory into pandas and concatenate them into one big dataframe. I have not been able to figure it out though. I need some help with the for loop and building a concatenated dataframe:
Here is what I have so far:

import sys
import csv
import glob
import pandas as pd

# get data file names
path =r'C:DRODCL_rawdata_filesexcelfiles'
filenames = glob.glob(path + "/*.xlsx")

dfs = []

for df in dfs: 
    xl_file = pd.ExcelFile(filenames)
    df=xl_file.parse('Sheet1')
    dfs.concat(df, ignore_index=True)
Asked By: jonas

||

Answers:

As mentioned in the comments, one error you are making is that you are looping over an empty list.

Here is how I would do it, using an example of having 5 identical Excel files that are appended one after another.

(1) Imports:

import os
import pandas as pd

(2) List files:

path = os.getcwd()
files = os.listdir(path)
files

Output:

['.DS_Store',
 '.ipynb_checkpoints',
 '.localized',
 'Screen Shot 2013-12-28 at 7.15.45 PM.png',
 'test1 2.xls',
 'test1 3.xls',
 'test1 4.xls',
 'test1 5.xls',
 'test1.xls',
 'Untitled0.ipynb',
 'Werewolf Modelling',
 '~$Random Numbers.xlsx']

(3) Pick out ‘xls’ files:

files_xls = [f for f in files if f[-3:] == 'xls']
files_xls

Output:

['test1 2.xls', 'test1 3.xls', 'test1 4.xls', 'test1 5.xls', 'test1.xls']

(4) Initialize empty dataframe:

df = pd.DataFrame()

(5) Loop over list of files to append to empty dataframe:

for f in files_xls:
    data = pd.read_excel(f, 'Sheet1')
    df = df.append(data)

(6) Enjoy your new dataframe. 🙂

df

Output:

  Result  Sample
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
Answered By: ericmjl

this works with python 2.x

be in the directory where the Excel files are

see http://pbpython.com/excel-file-combine.html

import numpy as np
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob("*.xlsx"):
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)

# now save the data frame
writer = pd.ExcelWriter('output.xlsx')
all_data.to_excel(writer,'sheet1')
writer.save()    
Answered By: john blue
import pandas as pd

import os

os.chdir('...')

#read first file for column names

fdf= pd.read_excel("first_file.xlsx", sheet_name="sheet_name")

#create counter to segregate the different file's data

fdf["counter"]=1

nm= list(fdf)

c=2

#read first 1000 files

for i in os.listdir():

  print(c)

  if c<1001:

    if "xlsx" in i:

      df= pd.read_excel(i, sheet_name="sheet_name")

      df["counter"]=c

      if list(df)==nm:

        fdf=fdf.append(df)

        c+=1

      else:

        print("headers name not match")

    else:

      print("not xlsx")


fdf=fdf.reset_index(drop=True)

#relax
Answered By: TBhavnani
import pandas as pd
import os

files = [file for file in os.listdir('./Salesfolder')]
all_month_sales= pd.DataFrame()
for file in files
    df= pd.read_csv("./Salesfolder/"+file)
    all_months_data=pd.concat([all_months_sales,df])
all_months_data.to_csv("all_data.csv",index=False)

You can go and read all your .xls files from folder (Salesfolder in my case) and same for your local path. Using iteration through whcih you can put them into empty data frame and you can concatnate your data frame to this . I have also exported to another csv for all months data into one csv file

Answered By: aruna kumar

This can be done in this way:

import pandas as pd
import glob

all_data = pd.DataFrame()
for f in glob.glob("/path/to/directory/*.xlsx"):
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)

all_data.to_csv("new_combined_file.csv")  
Answered By: Abhilash Ramteke

#shortcut

import pandas as pd 
from glob import glob

dfs=[]
for f in glob("data/*.xlsx"):
    dfs.append(pd.read_excel(f))
df=pd.concat(dfs, ignore_index=True)
Answered By: Dan

You can use list comprehension inside concat:

import os
import pandas

path = '/path/to/directory/'
filenames = [file for file in os.listdir(path) if file.endswith('.xlsx')]

df = pd.concat([pd.read_excel(path + file) for file in filenames], ignore_index=True)

With ignore_index = True the index of df will be labeled 0, …, n – 1.

Answered By: rachwa

There is an even neater way to do that.

# import libraries
import glob
import pandas as pd

# get the absolute path of all Excel files 
allExcelFiles = glob.glob("/path/to/Excel/files/*.xlsx")

# read all Excel files at once
df = pd.concat(pd.read_excel(excelFile) for excelFile in allExcelFiles)
Answered By: zoumpp
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.