loading multiple json files from different folders in python
Question:
I have a folder structure that looks like this:
1. data
1.1. ABC
1.1.1 monday_data
monday.json
1.1.2 tuesday_data
tuesday.json
1.2. YXZ
1.2.1 wednesday_data
wednesday.json
1.2.2
etc
I want to unpack all of these json files into a pandas dataframe in python.
I have spend alot of time trying to get this to work, but without success.
What would be the most efficient way to do this?
Answers:
You can use rglob
from pathlib.Path to get the path of all files under a directory that end with a certain extension
from pathlib import Path
for path in Path('data').rglob('*.json'):
print(path)
Outputs
directoryABCmonday_datamonday.json
directoryABCtuesday_datatuesday.json
directoryXYZwednesday_datawednesday.json
Now you can simple read this data into a dataframe according to your requirements
import os
import glob
import pandas as pd
# set the path to the directory where the JSON files are located
path = 'data/'
# use glob to find all the JSON files in the directory + its subdirectories
json_files = glob.glob(os.path.join(path, '**/*.json'), recursive=True)
This is how you can get all paths to your JSON files.
I am not sure how you want to load all of them in a dataframe.
You can try something like this.
# create an empty list to store the dataframes
dfs = []
# loop over the JSON files and read each file into a dataframe
for file in json_files:
df = pd.read_json(file)
dfs.append(df)
# concatenate the dataframes into a single dataframe
df = pd.concat(dfs, ignore_index=True)
I have a folder structure that looks like this:
1. data
1.1. ABC
1.1.1 monday_data
monday.json
1.1.2 tuesday_data
tuesday.json
1.2. YXZ
1.2.1 wednesday_data
wednesday.json
1.2.2
etc
I want to unpack all of these json files into a pandas dataframe in python.
I have spend alot of time trying to get this to work, but without success.
What would be the most efficient way to do this?
You can use rglob
from pathlib.Path to get the path of all files under a directory that end with a certain extension
from pathlib import Path
for path in Path('data').rglob('*.json'):
print(path)
Outputs
directoryABCmonday_datamonday.json
directoryABCtuesday_datatuesday.json
directoryXYZwednesday_datawednesday.json
Now you can simple read this data into a dataframe according to your requirements
import os
import glob
import pandas as pd
# set the path to the directory where the JSON files are located
path = 'data/'
# use glob to find all the JSON files in the directory + its subdirectories
json_files = glob.glob(os.path.join(path, '**/*.json'), recursive=True)
This is how you can get all paths to your JSON files.
I am not sure how you want to load all of them in a dataframe.
You can try something like this.
# create an empty list to store the dataframes
dfs = []
# loop over the JSON files and read each file into a dataframe
for file in json_files:
df = pd.read_json(file)
dfs.append(df)
# concatenate the dataframes into a single dataframe
df = pd.concat(dfs, ignore_index=True)