Housing Data Set Not Able to Load From 'Hands-On Machine Learning'

Question:

I have followed other solutions that were posted on stackoverflow about trying to load the housing dataset which mostly included trying to call ‘fetch_housing_data()’ as well. However, even after I do that, I still get a filenotfound error indicating that there is no dataset called ‘datasets/housing’. Here is the code that I have.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import tarfile
from six.moves import urllib

DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml2/master/"
HOUSING_PATH = os.path.join('datasets', 'housing')
HOUSING_URL = DOWNLOAD_ROOT + HOUSING_PATH + '/housing.tgz'


def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
    if not os.path.isdir(housing_path):
        os.mkdir(housing_path)
    tgz_path = os.path.join(housing_path, 'housing.tgz')
    urllib.request.urlretrieve(housing_url, tgz_path)
    housing_tgz = tarfile.open(tgz_path)
    housing_tgz.extractall(path=housing_path)
    housing_tgz.close()


fetch_housing_data()


def load_housing_data(housing_path=HOUSING_PATH):
    csv_path = os.path.join(housing_path, 'housing.csv')
    return pd.read_csv(csv_path)


dataset = load_housing_data()
dataset.head()


I tried to get the housing dataset from the link provided in the book with the proper function call and expected for the dataset to be retrieved. However, it has still produced an error for me despite the call.

Asked By: A.K Krishnamurthy

||

Answers:

The traceback I got when running the code looked like:

Traceback (most recent call last):
  File "/home/hayesall/answer.py", line 23, in <module>
    fetch_housing_data()
  File "/home/hayesall/answer.py", line 15, in fetch_housing_data
    os.mkdir(housing_path)
FileNotFoundError: [Errno 2] No such file or directory: 'datasets/housing'

This occurs because the datasets directory did not exist.

If you first run:

mkdir datasets

Then re-run the code, you see two files under datasets/housing/

datasets/
└── housing
    ├── housing.csv
    └── housing.tgz

Alternatively, replace the os.mkdir call with os.makedirs to recursively create the directories with nested paths:

def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
    if not os.path.isdir(housing_path):
        os.makedirs(housing_path)
Answered By: Alexander L. Hayes
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.