Converting JSON to CSV using Python. How to remove certain text/characters if found, and how to better format the cell?

Question:

I apologise in advanced if i have not provided enough information,using wrong terminology or im not formatting my question correctly. This is my first time asking questions here.

This is the script for the python script: https://pastebin.com/WWViemwf

This is the script for the JSON file (contains the first 4 elements hydrogen, helium, lithium, beryllium): https://pastebin.com/fyiijpBG

As seen, I’m converting the file from ".json" to ".csv".

The JSON file sometimes contains fields that say "NotApplicable" or "Unknown". Or it will show me weird text that I’m not familiar with.

For example here:

        "LiquidDensity": {
            "data": "NotAvailable",
            "tex_description": "\text{liquid density}"
        },

And here:

                "MagneticMoment": {
                    "data": "Unknown",
                    "tex_description": "\text{magnetic dipole moment}"
                },

Here is the code ive made to convert from ".json" to ".csv":

        #liquid density
        liquid_density = element_data["LiquidDensity"]["data"]
        if isinstance(liquid_density, dict):
            liquid_density_value = liquid_density["value"]
            liquid_density_unit = liquid_density["tex_unit"]
        else:
            liquid_density_value = liquid_density
            liquid_density_unit = ""

However in the csv file it shows up like this.


I’m also trying to remove these characters that i’m seeing in the ".csv" file.

In the JSON file, this is how the data is viewed:

        "AtomicMass": {
            "data": {
                "value": "4.002602",
                "tex_unit": "\text{u}"
            },
            "tex_description": "\text{atomic mass}"
        },

And this is how i coded to convert, using Python:

        #atomic mass
        atomic_mass = element_data["AtomicMass"]["data"]
        if isinstance(atomic_mass, dict):
            atomic_mass_value = atomic_mass["value"]
            atomic_mass_unit = atomic_mass["tex_unit"]
        else:
            atomic_mass_value = atomic_mass
            atomic_mass_unit = ""

What have i done wrong?

I’ve tried replacing:

        #melting point
        melting_point = element_data["MeltingPoint"]["data"]
        if isinstance(melting_point, dict):
            melting_point_value = melting_point["value"]
            melting_point_unit = melting_point["tex_unit"]
        else:
            melting_point_value = melting_point
            melting_point_value = ""

With:

        #melting point
        melting_point = element_data["MeltingPoint"]["data"]
        if isinstance(melting_point, dict):
            melting_point_value = melting_point["value"]
            melting_point_unit = melting_point["tex_unit"]
        elif melting_point == "NotApplicable" or melting_point == "Unknown":
            melting_point_value = ""
            melting_point_unit = ""
        else:
            melting_point_value = melting_point
            melting_point_unit = ""

However that doesn’t seem to work.

Asked By: DuNeemo

||

Answers:

Your code is fine, what went wrong is at the writing, let me take out some part of it.

#I will only be using Liquid Density as example, so I won't be showing the others
headers = [..., "Liquid Density", ...]

#liquid_density data reading part
liquid_density = element_data["LiquidDensity"]["data"]
if isinstance(liquid_density, dict):
      liquid_density_value = liquid_density["value"]
      liquid_density_unit = liquid_density["tex_unit"]
else:
      liquid_density_value = liquid_density
      liquid_density_unit = ""

#your writing of the data into the csv
 writer.writerow([..., liquid_density, ...])       

You write liquid_density directly into your csv, that is why it shows the dictionary. If you want to write the value only, I believe you should change the value in write line to
writer.writerow([..., liquid_density_value, ...])

Answered By: Joshua
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.