Recalling a variable created in an if statement

Question:

I am iterating through files in a directory. But I only need files with .csv extension. Then I need to use the path to those files to use them later in the code. Do determine if the file is .csv I use this:

for subdir in os.listdir(root):
    for file in os.listdir(os.path.join(root,subdir)):
        if file.endswith(ext):
           print(file)

This gives me all the files with .csv extension. Now I want to create a string with the path to these files so I use this:

for subdir in os.listdir(root):
    for file in os.listdir(os.path.join(root,subdir)):
        if file.endswith(ext):
           
            datoteka = root + '\' + subdir + '\' + file

The path to my files is now stored in string datoteka and I want to use this inside this for loop. The one that also contains the if statement.

But I get an error that datoteka is not defined. After a quick research I found out that I can not use variables that were defined inside an If statement outside of that If statement. Is there a way to pull the variable out?

I need to preform some data analysis on the files (datoteka contains the path.):

for subdir in os.listdir(root):
    for file in os.listdir(os.path.join(root,subdir)):
        if file.endswith(ext):
            datoteka = root + '\' + subdir + '\' + file 
    df = pd.read_csv(datoteka, encoding = 'cp1252')

This gives the following error:
enter image description here

Is there another way I could get my paths without defining datoteka inside that If statemnet?

Asked By: CH4

||

Answers:

Use glob and build a list of files like this:

from os.path import join
from glob import glob
from pandas import read_csv

ROOT = 'root' # root directory
SUBDIR = 'sub' # sub directory

list_of_csvs = [file for file in glob(join(ROOT, SUBDIR, '*.csv'))]

# now iterate over the list

for file in list_of_csvs:
    df = read_csv(file)

Or, if you don’t need to keep the list of files and want a recursive search then it’s just:

from os.path import join
from glob import glob
from pandas import read_csv

ROOT = 'root' # root directory

for file in glob(join(ROOT, '**', '*.csv'), recursive=True):
    df = read_csv(file)
Answered By: Stuart

The variable "datoteka" is defined inside the if block so this can’t be accessed outside the block, so define the variable outside the for loop.

Example :

for subdir in os.listdir(root):
    datoteka = None
    for file in os.listdir(os.path.join(root,subdir)):
        if file.endswith(ext):
            datoteka = root + '\' + subdir + '\' + file 
    if datoteka is not None:
        df = pd.read_csv(datoteka, encoding = 'cp1252')