loading multiple json files from different folders in python

Question:

I have a folder structure that looks like this:

1. data
  1.1. ABC
    1.1.1 monday_data
      monday.json
    1.1.2 tuesday_data
      tuesday.json
  1.2. YXZ
    1.2.1 wednesday_data
      wednesday.json
    1.2.2
      etc

I want to unpack all of these json files into a pandas dataframe in python.

I have spend alot of time trying to get this to work, but without success.

What would be the most efficient way to do this?

Asked By: mbih

||

Answers:

You can use rglob from pathlib.Path to get the path of all files under a directory that end with a certain extension

from pathlib import Path

for path in Path('data').rglob('*.json'):
    print(path)

Outputs

directoryABCmonday_datamonday.json
directoryABCtuesday_datatuesday.json
directoryXYZwednesday_datawednesday.json

Now you can simple read this data into a dataframe according to your requirements

Answered By: InsertCheesyLine
import os
import glob
import pandas as pd

# set the path to the directory where the JSON files are located
path = 'data/'

# use glob to find all the JSON files in the directory + its subdirectories
json_files = glob.glob(os.path.join(path, '**/*.json'), recursive=True)

This is how you can get all paths to your JSON files.
I am not sure how you want to load all of them in a dataframe.

You can try something like this.

# create an empty list to store the dataframes
dfs = []

# loop over the JSON files and read each file into a dataframe
for file in json_files:
     df = pd.read_json(file)
     dfs.append(df)

# concatenate the dataframes into a single dataframe
df = pd.concat(dfs, ignore_index=True)
Answered By: Amadeus
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.