How to do a for loop to load pickle files?

Question:

I am trying to automate loading 12 pickle files that have similar names using a for loop.

I have AirBnB data for 3 different cities (Jersey city, New York city and Rio), each city have 4 types of files (listings, calendar, locale, and reviews); I have 12 files in total, the names of the file are very similar (city_fileType.pkl).

  jc_listings.pkl, jc_calendar.pkl, jc_locale.pkl, jc_reviews.pkl  # Jersey city dataset
  nyc_listings.pkl, nyc_calendar.pkl , nyc_locale.pkl, nyc_reviews # New York City dataset
  rio_listings.pkl, rio_calendar.pkl, rio_locale.pkl, rio_reviews.pkl # Rio city dataset

I am trying to automate the loading of these files.

When I run the code:

path_data = '../Data/' # local path

jc_listings = pd.read_pickle(path_data+'jc_listings.pkl')

jc_listings.info()

This works fine.

But when I try to automate it does work properly. I am trying:

# load data
path_data = '../Data/'

#list of all data names
city_data = ['jc_listings','jc_calendar','jc_locale','jc_reviews',
             'nyc_listings','nyc_calendar','nyc_locale','nyc_reviews',
             'rio_listings','rio_calendar','rio_locale','rio_reviews']

# loop to load all the data with respective name
for city in city_data:
    data_name = city
    print(data_name) # just to inspect and troubleshoot
    city = pd.read_pickle(path_data+data_name+'.pkl')
    print(type(city)) # just to inspect and troubleshoot

This runs without errors and the printouts looks fine. However when I try

rio_reviews.info()

I get the following error:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In [37], line 3
      1 # inspecting the data
----> 3 rio_reviews.info()

NameError: name 'rio_reviews' is not defined
Asked By: Marcio Bernardo

||

Answers:

It looks like you have stored all the data in city and have not defined the "rio_reviews" variable thats why you are getting this error

Answered By: Mohammad Haddy

I would suggest you another approach:

import pandas as pd
from pathlib import Path

data = Path('../Data')

cities = ['jc', 'nyc', 'rio']
files = ['listings', 'calendar', 'locale', 'reviews']
dfs = {}

for city in cities:
    for file in files:
        dfs[city][file] = pd.read_pickle(data / f'{city}_{file}.pkl')

That will give a dictionary dfs, from which you can access each city data with something like this:

dfs['jc']['listings'].info()
dfs['rio']['reviews'].info()

… for example.

We can further simplify the code using itertools.product:

import pandas as pd
from pathlib import Path
from itertools import product

data = Path('../Data')

cities = ['jc', 'nyc', 'rio']
files = ['listings', 'calendar', 'locale', 'reviews']
dfs = {}

for city, file in product(cities, files):
    dfs[city][file] = pd.read_pickle(data / f'{city}_{file}.pkl')

Answered By: accdias
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.