Normalize string from webpage
Question:
Trying to normalize the string "PartIIxa0I x96 FINANCIALn INFORMATION"
. In general, all that should be left (once non utf-8 characters are excluded) are letters, numbers and dots. Therefore the expected output is "PartII FINANCIAL INFORMATION"
. The text comes from this Sec form.
Solutions tried, where text is the string:
text.encode('utf-8', errors='ignore').decode('utf-8')
unicodedata.normalize(decoding, text)
Answers:
Use this it will work for you:
text.encode('ascii', errors='ignore').decode('utf-8')
also if you need to remove n
use this:
text.replace('n', "").encode('ascii', errors='ignore').decode('utf-8')
Trying to normalize the string "PartIIxa0I x96 FINANCIALn INFORMATION"
. In general, all that should be left (once non utf-8 characters are excluded) are letters, numbers and dots. Therefore the expected output is "PartII FINANCIAL INFORMATION"
. The text comes from this Sec form.
Solutions tried, where text is the string:
text.encode('utf-8', errors='ignore').decode('utf-8')
unicodedata.normalize(decoding, text)
Use this it will work for you:
text.encode('ascii', errors='ignore').decode('utf-8')
also if you need to remove n
use this:
text.replace('n', "").encode('ascii', errors='ignore').decode('utf-8')