How to remove those “x00x00” in a string ?
I have many of those strings (example shown below). I can use
re.sub to replace those “x00”. But I am wondering whether there is a better way to do that? Converting between unicode, bytes and string is always confusing.
>>> text = 'Hellox00x00x00x00' >>> text.rstrip('x00') 'Hello'
It removes all
x00 characters at the end of the string.
>>> a = 'Hellox00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00' >>> a.replace('x00','') 'Hello'
I think the more general solution is to use:
cleanstring = nullterminatedstring.split('x00',1)
split the string using
x00 as the delimeter
1 time. The
split(...) returns a 2 element list: everything before the null in addition to everything after the null (it removes the delimeter). Appending
 only returns the portion of the string before the first null (x00) character, which I believe is what you’re looking for.
The convention in some languages, specifically C-like, is that a single null character marks the end of the string. For example, you should also expect to see strings that look like:
The answer supplied here will handle that situation as well as the other examples.
Building on the answers supplied, I suggest that strip() is more generic than rstrip() for cleaning up a data packet, as strip() removes chars from the beginning and the end of the supplied string, whereas rstrip() simply removes chars from the end of the string.
However, NUL chars are not treated as whitespace by default by strip(), and as such you need to specify explicitly. This can catch you out, as print() will of course not show the NUL chars. My solution that I used was to clean the string using "
>>> arbBytesFromSocket = b'x00x00x00x00hellox00x00x00x00' >>> arbBytesAsString = arbBytesFromSocket.decode('ascii') >>> print(arbBytesAsString) hello >>> str(arbBytesAsString) 'x00x00x00x00hellox00x00x00x00' >>> arbBytesAsString = arbBytesFromSocket.decode('ascii').strip().strip('x00') >>> str(arbBytesAsString) 'hello' >>>
This gives you the string/byte array required, without the NUL chars on each end, and also preserves any NUL chars inside the "data packet", which is useful for received byte data that may contain valid NUL chars (eg. a C-type structure). NB. In this case the packet must be "wrapped", i.e. surrounded by non-NUL chars (prefix and suffix), to allow correct detection, and thus only strip unwanted NUL chars.
rstrip and they didn’t work, but this one did;
split and then
join the result
if 'x00' in name: name=' '.join(name.split('x00'))
I ran into this problem copy lists out of Excel. Process was:
Problem was intermitently was returning multiple ‘x00’ at the end of the text when reading the clipboard.
Have changed from using win32clipboard to using pyperclip to read the clipboard, and it seems to have resolved the problem.
Neil wrote, ‘…you might want to put some thought into why you have them in the first place.’
For my own issue with this error code, this led me to the problem. My saved file that I was reading from was in unicode. Once I re-saved the file as a plain ASCII text, the problem was solved