I am trying to read an excel file this way :
newFile = pd.ExcelFile(PATHFileName.xlsx) ParsedData = pd.io.parsers.ExcelFile.parse(newFile)
which throws an error that says two arguments expected, I don’t know what the second argument is and also what I am trying to achieve here is to convert an Excel file to a DataFrame, Am I doing it the right way? or is there any other way to do this using pandas?
Close: first you call
ExcelFile, but then you call the
.parse method and pass it the sheet name.
>>> xl = pd.ExcelFile("dummydata.xlsx") >>> xl.sheet_names [u'Sheet1', u'Sheet2', u'Sheet3'] >>> df = xl.parse("Sheet1") >>> df.head() Tid dummy1 dummy2 dummy3 dummy4 dummy5 0 2006-09-01 00:00:00 0 5.894611 0.605211 3.842871 8.265307 1 2006-09-01 01:00:00 0 5.712107 0.605211 3.416617 8.301360 2 2006-09-01 02:00:00 0 5.105300 0.605211 3.090865 8.335395 3 2006-09-01 03:00:00 0 4.098209 0.605211 3.198452 8.170187 4 2006-09-01 04:00:00 0 3.338196 0.605211 2.970015 7.765058 dummy6 dummy7 dummy8 dummy9 0 0.623354 0 2.579108 2.681728 1 0.554211 0 7.210000 3.028614 2 0.567841 0 6.940000 3.644147 3 0.581470 0 6.630000 4.016155 4 0.595100 0 6.350000 3.974442
What you’re doing is calling the method which lives on the class itself, rather than the instance, which is okay (although not very idiomatic), but if you’re doing that you would also need to pass the sheet name:
>>> parsed = pd.io.parsers.ExcelFile.parse(xl, "Sheet1") >>> parsed.columns Index([u'Tid', u'dummy1', u'dummy2', u'dummy3', u'dummy4', u'dummy5', u'dummy6', u'dummy7', u'dummy8', u'dummy9'], dtype=object)
Thought i should add here, that if you want to access rows or columns to loop through them, you do this:
import pandas as pd # open the file xlsx = pd.ExcelFile("PATHFileName.xlsx") # get the first sheet as an object sheet1 = xlsx.parse(0) # get the first column as a list you can loop through # where the is 0 in the code below change to the row or column number you want column = sheet1.icol(0).real # get the first row as a list you can loop through row = sheet1.irow(0).real
irow(i) are deprecated now. You can use
sheet1.iloc[:,i] to get the i-th col and
sheet1.iloc[i,:] to get the i-th row.
This is much simple and easy way.
import pandas df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname='Sheet 1') # or using sheet index starting 0 df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname=2)
Check out documentation full details.
sheetname keyword is deprecated for newer Pandas versions, use
I think this should satisfy your need:
import pandas as pd # Read the excel sheet to pandas dataframe df = pd.read_excel("PATHFileName.xlsx", sheet_name=0) #corrected argument name
You just need to feed the path to your file to
import pandas as pd file_path = "./my_excel.xlsx" data_frame = pd.read_excel(file_path)
Checkout the documentation to explore parameters like
skiprows to ignore rows when loading the excel
import pandas as pd data = pd.read_excel (r'**YourPath**.xlsx') print (data)
Here is an updated method with syntax that is more common in python code. It also prevents you from opening the same file multiple times.
import pandas as pd sheet1, sheet2 = None, None with pd.ExcelFile("PATHFileName.xlsx") as reader: sheet1 = pd.read_excel(reader, sheet_name='Sheet1') sheet2 = pd.read_excel(reader, sheet_name='Sheet2')
Loading an excel file without explicitly naming a sheet but instead giving the number of the sheet order (often one will simply load the first sheet) goes like:
import pandas as pd myexcel = pd.ExcelFile("C:/filename.xlsx") myexcel = myexcel.parse(myexcel.sheet_names)
.sheet_names returns a list of sheet names, it is easy to load one or more sheets by simply calling the list element(s).
All of these works for me
In : import pandas as pd In : df = pd.read_excel('FileName.xlsx') # If there is only one sheet in the excel file In : df = pd.read_excel('FileName.xlsx', sheet_name=0) In : In : df = pd.read_excel('FileName.xlsx', sheet_name='Sheet 1')
#load pandas library
import pandas as pd
#set path where the file is
path = "./myfile.xlsx"
#load the file into dataframe df
df = pd.read_excel(path)
#check the first 5 rows