Truncate a string to a specific number of bytes in Python
Question:
How can I truncate a string to be no longer than 50 bytes?
a = 'asdfzx안녕하세요awelkjawletjawetr방갑습니다.dlgawklejtwgasdgsdfgd
sdfasdfsdafa궁금해요rewgargasregawergedrhsedhesrdhrthdrfjydjdrktydjdyj'
max = 50byte
a = max(a)
Answers:
You can use getsizeof()
to get the size and then put it in a if statement to check.getsizeof()
returns the size of an object in bytes.Hope my example gives you an idea :
from sys import getsizeof
a = '4200547985984359347509gbrtbhrtbrtbargrefefwefwef'
b = 'hello'
if getsizeof(a.encode('ascii')) > 50 :
print("Error:a")
if getsizeof(b.encode('ascii')) > 50 :
print("Error:b")
You could just use slicing if you are using a one character per byte encoding:
a = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
a[:50]
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
Note that would only makes sense if you are writing this somewhere. If the size of the python object representation makes more sense in your use-case, the answer of @Achilles should works.
Try this (to truncate to 10 bytes):
s = "asdfzx안녕하세요awelkjawletjawetr방갑습니"
s.encode('utf-8')[:10].decode('utf-8', 'ignore')
How can I truncate a string to be no longer than 50 bytes?
a = 'asdfzx안녕하세요awelkjawletjawetr방갑습니다.dlgawklejtwgasdgsdfgd
sdfasdfsdafa궁금해요rewgargasregawergedrhsedhesrdhrthdrfjydjdrktydjdyj'
max = 50byte
a = max(a)
You can use getsizeof()
to get the size and then put it in a if statement to check.getsizeof()
returns the size of an object in bytes.Hope my example gives you an idea :
from sys import getsizeof
a = '4200547985984359347509gbrtbhrtbrtbargrefefwefwef'
b = 'hello'
if getsizeof(a.encode('ascii')) > 50 :
print("Error:a")
if getsizeof(b.encode('ascii')) > 50 :
print("Error:b")
You could just use slicing if you are using a one character per byte encoding:
a = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
a[:50]
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
Note that would only makes sense if you are writing this somewhere. If the size of the python object representation makes more sense in your use-case, the answer of @Achilles should works.
Try this (to truncate to 10 bytes):
s = "asdfzx안녕하세요awelkjawletjawetr방갑습니"
s.encode('utf-8')[:10].decode('utf-8', 'ignore')