convert ansi escape to utf-8 in python
Question:
I may be wrong in accessing weather this string is ansi or anything else but it comes from rtf docs with heading.
{rtf1ansiansicpg1252
the string of interest from doc is:
ansi_string = r'3 u176? u177? 0.2u176? (2u952?)'
when i open the doc with word it gives me : 3° ± 0.2° 2θ
Questions are:
1) what are these escape codes?
is it possible to convert this string to utf-8 using python inbuilt methods?
Answers:
I don’t think this is the best answer but to make a point what I want, here is the working code.
import clr
clr.AddReference("System")
clr.AddReference("System.Windows.Forms")
import System.Windows.Forms as WinForms
def rtf_to_text(rtf_str):
rtf = r"{rtf1ansiansicpg1252" + 'n' + rtf_str + 'n' + '}'
richTextBox = WinForms.RichTextBox()
richTextBox.Rtf = rtf
return richTextBox.Text
print(rtf_to_text(r'3 u176? u177? 0.2u176? (2u952?)'))
-->'3 ° ± 0.2° (2θ)'
I may be wrong in accessing weather this string is ansi or anything else but it comes from rtf docs with heading.
{rtf1ansiansicpg1252
the string of interest from doc is:
ansi_string = r'3 u176? u177? 0.2u176? (2u952?)'
when i open the doc with word it gives me : 3° ± 0.2° 2θ
Questions are:
1) what are these escape codes?
is it possible to convert this string to utf-8 using python inbuilt methods?
I don’t think this is the best answer but to make a point what I want, here is the working code.
import clr
clr.AddReference("System")
clr.AddReference("System.Windows.Forms")
import System.Windows.Forms as WinForms
def rtf_to_text(rtf_str):
rtf = r"{rtf1ansiansicpg1252" + 'n' + rtf_str + 'n' + '}'
richTextBox = WinForms.RichTextBox()
richTextBox.Rtf = rtf
return richTextBox.Text
print(rtf_to_text(r'3 u176? u177? 0.2u176? (2u952?)'))
-->'3 ° ± 0.2° (2θ)'