Replacing 'x..' litteral string in Python
Question:
I have directories that contains ‘x..’ characters such as ‘x00’ :
#ls
cx00mb
and I want to rename them without these, since when I copy those files to windows they become unusable.
So my python script is going through those directories and detecting the problematic characters the following way:
if '\x' in dir: # dir is the name of the current directory
First I thought I could get rid of this problem by using the re
module in python:
new_dir_name = re.sub('x00', r'', dir) # I am using x00 as an example
But this didn’t work. Is there a way I can replace thoses characters using python?
EDIT :
in order to understand the char, when I pipe ls
to xxd
the ” character appears in the ascii representation. In hexadecimal it shows ‘5c’
Answers:
This string.replace worked for me:
dir = r'foox00bar'
print dir
dir.replace(r'x00', '')
print dir
Output is:
foox00bar
foobar
string.replace(s, old, new[, maxreplace])
Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced.
A regular expression could also work for the general case, but you’ll have to escape the backslash so that x
itself isn’t interpreted as a regular expression escape.
For the general case of removing x
followed by two hexadecimal digits:
import re
dir = r'foox1Dbar'
print dir
re.sub(r'\x[0-9A-F]{2}', '', dir)
print dir
Output is:
foox1Dbar
foobar
This interpreter session should show the difference between your dirname have an actual null character in it, versus having a backlash followed by an x
followed by two 0
s.
>>> bad_dir_name = "cx00mb"
>>> bad_dir_name
'cx00mb'
>>> good_dir_name = bad_dir_name.replace("x00", "")
>>> good_dir_name
'cmb'
>>>
>>> bad_dir_name2 = "c\x00mb"
>>> bad_dir_name2
'c\x00mb'
>>> good_dir_name2 = bad_dir_name2.replace("\", "") # remove the backslash
>>> good_dir_name2
'cx00mb'
In either case, the string.replace
is the way to go.
The x
represents a hexadecimal character escape.
The answer provided by tavnab does not work for me.
The reason is that string.replace
is not an inplace method. Also, in Python 3 the print
function requires parentheses. So, the updated answer should be:
dir = = r'foox00bar'
print(dir)
dir = dir.replace(r'x00', '')
print(dir)
which then indeed produces the following output:
foox00bar
foobar
I have directories that contains ‘x..’ characters such as ‘x00’ :
#ls
cx00mb
and I want to rename them without these, since when I copy those files to windows they become unusable.
So my python script is going through those directories and detecting the problematic characters the following way:
if '\x' in dir: # dir is the name of the current directory
First I thought I could get rid of this problem by using the re
module in python:
new_dir_name = re.sub('x00', r'', dir) # I am using x00 as an example
But this didn’t work. Is there a way I can replace thoses characters using python?
EDIT :
in order to understand the char, when I pipe ls
to xxd
the ” character appears in the ascii representation. In hexadecimal it shows ‘5c’
This string.replace worked for me:
dir = r'foox00bar'
print dir
dir.replace(r'x00', '')
print dir
Output is:
foox00bar
foobar
string.replace(s, old, new[, maxreplace])
Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced.
A regular expression could also work for the general case, but you’ll have to escape the backslash so that x
itself isn’t interpreted as a regular expression escape.
For the general case of removing x
followed by two hexadecimal digits:
import re
dir = r'foox1Dbar'
print dir
re.sub(r'\x[0-9A-F]{2}', '', dir)
print dir
Output is:
foox1Dbar
foobar
This interpreter session should show the difference between your dirname have an actual null character in it, versus having a backlash followed by an x
followed by two 0
s.
>>> bad_dir_name = "cx00mb"
>>> bad_dir_name
'cx00mb'
>>> good_dir_name = bad_dir_name.replace("x00", "")
>>> good_dir_name
'cmb'
>>>
>>> bad_dir_name2 = "c\x00mb"
>>> bad_dir_name2
'c\x00mb'
>>> good_dir_name2 = bad_dir_name2.replace("\", "") # remove the backslash
>>> good_dir_name2
'cx00mb'
In either case, the string.replace
is the way to go.
The x
represents a hexadecimal character escape.
The answer provided by tavnab does not work for me.
The reason is that string.replace
is not an inplace method. Also, in Python 3 the print
function requires parentheses. So, the updated answer should be:
dir = = r'foox00bar'
print(dir)
dir = dir.replace(r'x00', '')
print(dir)
which then indeed produces the following output:
foox00bar
foobar