while using df.to_json it created this character u00a0 in json how to remove in pandas dataframe

Question:

While using df.to_json it created this character u00a0 in json how to remove in pandas dataframe

here is the output of the json

[
    {
        "dx_code":"A000",
        "formatted_code":"A00.0",
        "valid_for_coding":"0.0",
        "short_desc":null,
        "long_desc":null,
        "list_id":"Chronic_Body_Sys",
        "option_id":"1",
        "title":"Infectious and parasiticu00a0"
    },
    {
        "dx_code":"A00",
        "formatted_code":"A00",
        "valid_for_coding":0.0,
        "short_desc":"Cholera",
        "long_desc":"Cholera",
        "list_id":"Chronic_Body_System",
        "option_id":"1",
        "title":"Infectious and parasitic diseaseu00a0"
    },
    {
        "dx_code":"A000",
        "formatted_code":"A00.0",
        "valid_for_coding":1.0,
        "short_desc":"Cholera due to Vibrio cholerae 01, biovar cholerae",
        "long_desc":"Cholera due to Vibrio cholerae 01, biovar cholerae",
        "list_id":"Chronic_Body_System",
        "option_id":"1",
        "title":"Infectious and parasitic diseaseu00a0"
    },
    {
        "dx_code":"A001",
        "formatted_code":"A00.1",
        "valid_for_coding":1.0,
        "short_desc":"Cholera due to Vibrio cholerae 01, biovar eltor",
        "long_desc":"Cholera due to Vibrio cholerae 01, biovar eltor",
        "list_id":"Chronic_Body_System",
        "option_id":"1",
        "title":"Infectious and parasitic diseaseu00a0"
    }
}

this the code I used

testdata.to_json('testfile.json',indent=4,orient='records')

this u00a0 character is not present in the data and I don’t know how to remove it any suggestion for this code I was using jupyter notebook working on a dataframe

Answers:

Looking here, the 00a0 character is a no-break space. Using to_json‘s force_ascii should turn that to a normal n. Either way, deserializing (loading) this JSON should work just fine, as Python should know how to handle the character.

TL;DR
It is the unicode character for a space with no break, and is added in for formatting reasons. use force_ascii if you want it gone, but reading this JSON should work just fine.

Answered By: sami-amer

You should be able to keep this character without issue.

If really you want to remove it, remember that to_json returns a string, so you can use a simple:

s = df.to_json().replace('u00a0', '')
saving to file:
with open('testfile.json', 'w') as f:
    f.write(df.to_json(indent=4,orient='records').replace('u00a0', ''))
Answered By: mozway

for whitespaces we can use this code it will remove all we use is strip()
like this for my question I can use this

testdata.title.str.strip()
#and then
testdata.to_json()
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.