Reference dict variables for data manipulation purposes

Question

I have successfully iterated through multiple directories to create a dictionary of lists (excel files) of DataFrames (sheets). However, a) how would I read in specific worksheets that match 1-2 list values? and exclude all other worksheets so I don’t read in unnecessary amount of data in memory.

sheet_list = ["Total Residents", "Total (excluding Non-Residents)", "Individuals", "Corporations", "Other"] 
sheet_list2 = ["City1", "City2", "City3", "City4", "City5", "City6"]

and b) how to best reference dict object values? For example, currently my list df_list has 33 elements (dicts), with each dict having 14-30 keys (worksheets), and most having 360 cols x 40 rows of data. I need to be able to select specific columns/rows by column index value using the list and dict keys. However, how would I know if my lists and dict objects have been read-in in the correct order, without possibly adding in an additional key/reference ID?

For example, if my files are named: 1515CC, 2525CC, 3535CC, 1515DD, 2525DD, 3535DD, where 1515CC values in the Total Residents sheet should equal 1515DD City1 sheet and I need to cross-check and validate to make sure they are equal by splicing the "N" column or 9th column from the two sheets and comparing.

# Create list and iterate through select directories to get files
file_list = []
excludes = ["graphs", "archive"]
for root, directories, files in os.walk(root_path, topdown=True):
    directories[:] = [d for d in directories if d not in excludes]
    for filename in files:
        if fnmatch.fnmatch(filename, "0*.xlsx"):
            file_list.append(os.path.join(root,filename))

df_list = [pd.read_excel(files, sheet_name=None, skiprows=16, nrows=360, usecols="E:AR") for files in file_list]

Asked By: Sod

||

Source

Answer 1

Following @srinath’s recommendation, I decided to append the root link with the filename, like so file_list.append(os.path.join(root,filename)). This change has been made in my question, and the title has been revised to reflect the change in status. Thank you to everyone and @srinath.

Answered By: Sod

Reference dict variables for data manipulation purposes

Question:

Answers: