while using df.to_json it created this character u00a0 in json how to remove in pandas dataframe
Question:
While using df.to_json it created this character u00a0
in json how to remove in pandas dataframe
here is the output of the json
[
{
"dx_code":"A000",
"formatted_code":"A00.0",
"valid_for_coding":"0.0",
"short_desc":null,
"long_desc":null,
"list_id":"Chronic_Body_Sys",
"option_id":"1",
"title":"Infectious and parasiticu00a0"
},
{
"dx_code":"A00",
"formatted_code":"A00",
"valid_for_coding":0.0,
"short_desc":"Cholera",
"long_desc":"Cholera",
"list_id":"Chronic_Body_System",
"option_id":"1",
"title":"Infectious and parasitic diseaseu00a0"
},
{
"dx_code":"A000",
"formatted_code":"A00.0",
"valid_for_coding":1.0,
"short_desc":"Cholera due to Vibrio cholerae 01, biovar cholerae",
"long_desc":"Cholera due to Vibrio cholerae 01, biovar cholerae",
"list_id":"Chronic_Body_System",
"option_id":"1",
"title":"Infectious and parasitic diseaseu00a0"
},
{
"dx_code":"A001",
"formatted_code":"A00.1",
"valid_for_coding":1.0,
"short_desc":"Cholera due to Vibrio cholerae 01, biovar eltor",
"long_desc":"Cholera due to Vibrio cholerae 01, biovar eltor",
"list_id":"Chronic_Body_System",
"option_id":"1",
"title":"Infectious and parasitic diseaseu00a0"
}
}
this the code I used
testdata.to_json('testfile.json',indent=4,orient='records')
this u00a0 character is not present in the data and I don’t know how to remove it any suggestion for this code I was using jupyter notebook working on a dataframe
Answers:
Looking here, the 00a0
character is a no-break space. Using to_json
‘s force_ascii
should turn that to a normal n
. Either way, deserializing (loading) this JSON should work just fine, as Python should know how to handle the character.
TL;DR
It is the unicode character for a space with no break, and is added in for formatting reasons. use force_ascii
if you want it gone, but reading this JSON should work just fine.
You should be able to keep this character without issue.
If really you want to remove it, remember that to_json
returns a string, so you can use a simple:
s = df.to_json().replace('u00a0', '')
saving to file:
with open('testfile.json', 'w') as f:
f.write(df.to_json(indent=4,orient='records').replace('u00a0', ''))
for whitespaces we can use this code it will remove all we use is strip()
like this for my question I can use this
testdata.title.str.strip()
#and then
testdata.to_json()
While using df.to_json it created this character u00a0
in json how to remove in pandas dataframe
here is the output of the json
[
{
"dx_code":"A000",
"formatted_code":"A00.0",
"valid_for_coding":"0.0",
"short_desc":null,
"long_desc":null,
"list_id":"Chronic_Body_Sys",
"option_id":"1",
"title":"Infectious and parasiticu00a0"
},
{
"dx_code":"A00",
"formatted_code":"A00",
"valid_for_coding":0.0,
"short_desc":"Cholera",
"long_desc":"Cholera",
"list_id":"Chronic_Body_System",
"option_id":"1",
"title":"Infectious and parasitic diseaseu00a0"
},
{
"dx_code":"A000",
"formatted_code":"A00.0",
"valid_for_coding":1.0,
"short_desc":"Cholera due to Vibrio cholerae 01, biovar cholerae",
"long_desc":"Cholera due to Vibrio cholerae 01, biovar cholerae",
"list_id":"Chronic_Body_System",
"option_id":"1",
"title":"Infectious and parasitic diseaseu00a0"
},
{
"dx_code":"A001",
"formatted_code":"A00.1",
"valid_for_coding":1.0,
"short_desc":"Cholera due to Vibrio cholerae 01, biovar eltor",
"long_desc":"Cholera due to Vibrio cholerae 01, biovar eltor",
"list_id":"Chronic_Body_System",
"option_id":"1",
"title":"Infectious and parasitic diseaseu00a0"
}
}
this the code I used
testdata.to_json('testfile.json',indent=4,orient='records')
this u00a0 character is not present in the data and I don’t know how to remove it any suggestion for this code I was using jupyter notebook working on a dataframe
Looking here, the 00a0
character is a no-break space. Using to_json
‘s force_ascii
should turn that to a normal n
. Either way, deserializing (loading) this JSON should work just fine, as Python should know how to handle the character.
TL;DR
It is the unicode character for a space with no break, and is added in for formatting reasons. use force_ascii
if you want it gone, but reading this JSON should work just fine.
You should be able to keep this character without issue.
If really you want to remove it, remember that to_json
returns a string, so you can use a simple:
s = df.to_json().replace('u00a0', '')
saving to file:
with open('testfile.json', 'w') as f:
f.write(df.to_json(indent=4,orient='records').replace('u00a0', ''))
for whitespaces we can use this code it will remove all we use is strip()
like this for my question I can use this
testdata.title.str.strip()
#and then
testdata.to_json()