Python HTML Encoding xc2xa0
Question:
I’ve been struggling with this one for a while. I’m trying to write strings to HTML but have issues with the format once I’ve cleaned them. Here’s an example:
paragraphs = ['Grocery giant and household name Woolworths is battered and bruised. ',
'But behind the problems are still the makings of a formidable company']
x = str(" ")
for item in paragraphs:
x = x + str(item)
x
Output:
"Grocery giant and household namexc2xa0Woolworths is battered andxc2xa0bruised.
But behind the problems are still the makings of a formidablexc2xa0company"
Desired output:
"Grocery giant and household name Woolworths is battered and bruised.
But behind the problems are still the makings of a formidable company"
I’m hoping you’re able to explain why this happens and how I can fix. Thanks in advance!
Answers:
xc2xa0 means 0xC2 0xA0 is so-called
Non-breaking space
It is a kind of invisible control character in UTF-8 encodings. More info about it check the wikipedia: https://en.wikipedia.org/wiki/Non-breaking_space
I copied what you have pasted in the questions and got the expected output.
strong text-BEGIN CERTIFICATE…
MIID+2CCAuOgAw|BAg|JAPpM41+Ygd3NMA0GCSqGSIb3DQEBBQUAMIGTMQswCQYD
VQQGEwJVUZEQMA4GA1UECAwHQW×hYmFtYTETMBEGA1UEBwwKTW9udGdvbWVyeTEQ
MA4GA1UECgwHQ29tcGFueTERMA8GA1UECwwIRGI2aXNpb24×FDASBgNVBAMMC2V4
YW1wbGUuY29tMSIwIAYJKoZIhvcNAQkBFhNIeGFtcGxIQGV4YW1wbGUu29tMB4X
DTEyMTAWNDEyNDcxNVoXDTEzMTAwNDEyNDcNVowgZMxCzAJBgNVBAYTAIVTMRAw
Python HTML Encoding xc2xa0
I’ve been struggling with this one for a while. I’m trying to write strings to HTML but have issues with the format once I’ve cleaned them. Here’s an example:
paragraphs = ['Grocery giant and household name Woolworths is battered and bruised. ',
'But behind the problems are still the makings of a formidable company']
x = str(" ")
for item in paragraphs:
x = x + str(item)
x
Output:
"Grocery giant and household namexc2xa0Woolworths is battered andxc2xa0bruised.
But behind the problems are still the makings of a formidablexc2xa0company"
Desired output:
"Grocery giant and household name Woolworths is battered and bruised.
But behind the problems are still the makings of a formidable company"
I’m hoping you’re able to explain why this happens and how I can fix. Thanks in advance!
xc2xa0 means 0xC2 0xA0 is so-called
Non-breaking space
It is a kind of invisible control character in UTF-8 encodings. More info about it check the wikipedia: https://en.wikipedia.org/wiki/Non-breaking_space
I copied what you have pasted in the questions and got the expected output.
strong text-BEGIN CERTIFICATE…
MIID+2CCAuOgAw|BAg|JAPpM41+Ygd3NMA0GCSqGSIb3DQEBBQUAMIGTMQswCQYD
VQQGEwJVUZEQMA4GA1UECAwHQW×hYmFtYTETMBEGA1UEBwwKTW9udGdvbWVyeTEQ
MA4GA1UECgwHQ29tcGFueTERMA8GA1UECwwIRGI2aXNpb24×FDASBgNVBAMMC2V4
YW1wbGUuY29tMSIwIAYJKoZIhvcNAQkBFhNIeGFtcGxIQGV4YW1wbGUu29tMB4X
DTEyMTAWNDEyNDcxNVoXDTEzMTAwNDEyNDcNVowgZMxCzAJBgNVBAYTAIVTMRAw
Python HTML Encoding xc2xa0