Replacing 'x..' litteral string in Python

Question:

I have directories that contains ‘x..’ characters such as ‘x00’ :

#ls
cx00mb

and I want to rename them without these, since when I copy those files to windows they become unusable.
So my python script is going through those directories and detecting the problematic characters the following way:

if '\x' in dir: # dir is the name of the current directory

First I thought I could get rid of this problem by using the re module in python:

new_dir_name = re.sub('x00', r'', dir) # I am using x00 as an example

But this didn’t work. Is there a way I can replace thoses characters using python?

EDIT :
in order to understand the char, when I pipe ls to xxd the ” character appears in the ascii representation. In hexadecimal it shows ‘5c’

Asked By: aze

||

Answers:

This string.replace worked for me:

dir = r'foox00bar'
print dir
dir.replace(r'x00', '')
print dir

Output is:

foox00bar
foobar

string.replace(s, old, new[, maxreplace])

Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced.

A regular expression could also work for the general case, but you’ll have to escape the backslash so that x itself isn’t interpreted as a regular expression escape.

For the general case of removing x followed by two hexadecimal digits:

import re
dir = r'foox1Dbar'
print dir
re.sub(r'\x[0-9A-F]{2}', '', dir)
print dir

Output is:

foox1Dbar
foobar
Answered By: tavnab

This interpreter session should show the difference between your dirname have an actual null character in it, versus having a backlash followed by an x followed by two 0s.

>>> bad_dir_name = "cx00mb"
>>> bad_dir_name
'cx00mb'
>>> good_dir_name = bad_dir_name.replace("x00", "")
>>> good_dir_name
'cmb'
>>>
>>> bad_dir_name2 = "c\x00mb"
>>> bad_dir_name2
'c\x00mb'
>>> good_dir_name2 = bad_dir_name2.replace("\", "") # remove the backslash
>>> good_dir_name2
'cx00mb'

In either case, the string.replace is the way to go.

Answered By: turbulencetoo

The x represents a hexadecimal character escape.

The answer provided by tavnab does not work for me.

The reason is that string.replace is not an inplace method. Also, in Python 3 the print function requires parentheses. So, the updated answer should be:

dir =  = r'foox00bar'
print(dir)
dir = dir.replace(r'x00', '')
print(dir)

which then indeed produces the following output:

foox00bar
foobar
Answered By: J. Adan
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.