Normalize string from webpage

Question:

Trying to normalize the string "PartIIxa0I x96 FINANCIALn INFORMATION". In general, all that should be left (once non utf-8 characters are excluded) are letters, numbers and dots. Therefore the expected output is "PartII FINANCIAL INFORMATION". The text comes from this Sec form.

Solutions tried, where text is the string:

  1. text.encode('utf-8', errors='ignore').decode('utf-8')
  2. unicodedata.normalize(decoding, text)
Asked By: A259

||

Answers:

Use this it will work for you:

text.encode('ascii', errors='ignore').decode('utf-8')

also if you need to remove n use this:

text.replace('n', "").encode('ascii', errors='ignore').decode('utf-8')
Answered By: Mahmoud Nasser
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.