Pandas row to json

Question

I have a dataframe in pandas and my goal is to write each row of the dataframe as a new json file.

I’m a bit stuck right now. My intuition was to iterate over the rows of the dataframe (using df.iterrows) and use json.dumps to dump the file but to no avail.

Any thoughts?

Asked By: Roger Josh

||

Source

Answer 1

Pandas DataFrames have a to_json method that will do it for you:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html

If you want each row in its own file you can iterate over the index (and use the index to help name them):

for i in df.index:
    df.loc[i].to_json("row{}.json".format(i))

Answered By: tvashtar

Answer 2

Looping over indices is very inefficient.

A faster technique:

df['json'] = df.apply(lambda x: x.to_json(), axis=1)

Answered By: MrE

Answer 3

Using apply, this can be done as

def writejson(row):
  with open(row["filename"]+'.json', "w") as outfile:
    json.dump(row["json"], outfile, indent=2)

in_df.apply(writejson, axis=1)

Assuming the dataframe has a column named “filename” with filename for each json row.

Answered By: Steni Thomas

Answer 4

Extending the answer of @MrE, if you’re looking to convert multiple columns from a single row into another column with the content in json format (and not separate json files as output) I’ve had speed issues while using:

df['json'] = df.apply(lambda x: x.to_json(), axis=1)

I’ve achieved significant speed improvements on a dataset of 175K records and 5 columns using this line of code:

df['json'] = df.to_json(orient='records', lines=True).splitlines()

Speed went from >1 min to 350 ms.

Answered By: BramV

Answer 5

Here’s a simple solution:

transform a dataframe to json per record, one json per line. then simply split the lines

list_of_jsons = df.to_json(orient='records', lines=True).splitlines()

Answered By: ndueck

Pandas row to json

Question:

Answers: