Reading strings with special characters in Python
Question:
I have a string with special characters as follows
req_str = 'Nx08NAx08AMx08ME'
## If I print it I correctly get the word "NAME"
print(req_str)
>>> print(req_str)
NAME
Now I want to extract the string NAME
from the string.
I tried
''.join(c for c in 'Nx08NAx08AMx08ME' if c.isprintable())
## this produces
'NNAAMME'
I understand this has got to do with some special encoding. I am not very familiar with string encodings. My question is how can I extract the word ‘NAME` as a string in this situation ?
Answers:
According to the ASCII table, x08
is for backspace character. It can also be produced by b
:
req_str1 = "Nx08NAx08AMx08ME"
req_str2 = "NbNAbAMbME"
print(req_str1)
print(req_str2)
print(req_str1 == req_str2)
output:
NAME
NAME
True
Basically it writes a N
and then applies backspace then writes another N
. That’s why you see one N
in the final output. Same thing for A
, M
and E
.
To extract NAME
you can do what terminal does with it:
(thanks to @DarkKnight)
def extract(s):
BS = "x08"
r = []
for c in s:
if c == BS:
r = r[:-1]
else:
r.append(c)
return ''.join(r)
req_str = 'Nx08NAx08AMx08ME'
s = extract(req_str)
print(len(req_str))
print(s)
print(len(s))
Additional Information: If you wonder what the root of this is: back in the old days printers/typewriters used this technique to type a character twice to make it Bold. It’s called overstriking or overtyping
I have a string with special characters as follows
req_str = 'Nx08NAx08AMx08ME'
## If I print it I correctly get the word "NAME"
print(req_str)
>>> print(req_str)
NAME
Now I want to extract the string NAME
from the string.
I tried
''.join(c for c in 'Nx08NAx08AMx08ME' if c.isprintable())
## this produces
'NNAAMME'
I understand this has got to do with some special encoding. I am not very familiar with string encodings. My question is how can I extract the word ‘NAME` as a string in this situation ?
According to the ASCII table, x08
is for backspace character. It can also be produced by b
:
req_str1 = "Nx08NAx08AMx08ME"
req_str2 = "NbNAbAMbME"
print(req_str1)
print(req_str2)
print(req_str1 == req_str2)
output:
NAME
NAME
True
Basically it writes a N
and then applies backspace then writes another N
. That’s why you see one N
in the final output. Same thing for A
, M
and E
.
To extract NAME
you can do what terminal does with it:
(thanks to @DarkKnight)
def extract(s):
BS = "x08"
r = []
for c in s:
if c == BS:
r = r[:-1]
else:
r.append(c)
return ''.join(r)
req_str = 'Nx08NAx08AMx08ME'
s = extract(req_str)
print(len(req_str))
print(s)
print(len(s))
Additional Information: If you wonder what the root of this is: back in the old days printers/typewriters used this technique to type a character twice to make it Bold. It’s called overstriking or overtyping