when is it "safe" to mix path separators in Python strings representing Windows paths?

Question:

This minimal example: (Running in PyCharm debugger)

import os
from os.path import join
import subprocess

src_path = r'C:/TEMP/source'
dest_path = r'C:/TEMP/dest'


if __name__ == "__main__":
    for root, _, files in os.walk(src_path):
        for name in files:
            src_file_path = join(root, name)
            rel_dest_file_path = os.path.join(dest_path, os.path.dirname(os.path.relpath(src_file_path, src_path)))
            rdfp = join(rel_dest_file_path, name)
            sfp = src_file_path
            cmd = "['copy', '/v', %s, %s]" % (sfp, rdfp)
            print 'calling shell subprocess %s' % cmd
            subprocess.call(['copy', '/v', sfp, rdfp], shell=True)

Produces this output:

calling shell subprocess ['copy', '/v', C:/TEMP/sourcefoo bar.txt, C:/TEMP/destfoo bar.txt]
1 file(s) copied.
calling shell subprocess ['copy', '/v', C:/TEMP/sourcefoo.txt, C:/TEMP/destfoo.txt]
The syntax of the command is incorrect.

Process finished with exit code 0

Why doesn’t the path to the file named "foo bar.txt" also produce a command syntax error? Why does the path instead lead to a successful file copy?

I can fix the syntax problem in the example by explicitly using the Windows path separator in the initial raw string literal path assignments which makes sense to me.

src_path = r'C:TEMPsource'
dest_path = r'C:TEMPdest'

What doesn’t make sense is why a blank space in the "mixed slash" path also "solves" the syntax issue.

Any references or pointers?

Asked By: geneSummons

||

Answers:

The short answer: Be consistent, using OS-preferred separators, always. Don’t rely on the situations that happen to protect you by accident.

The explanation of your specific case: On Windows, a program is launched with a single string, not a vector of arguments like on POSIX systems. You passed a list as the command, which means it must be converted to a single string. Python does this with an internal function, list2cmdline. It adds quoting around empty arguments, as well as any argument containing a space or tab. As a result, your code quotes the paths only when it has a space:

>>> print(subprocess.list2cmdline(['copy', '/v', r"C:/TEMP/sourcefoo bar.txt", r"C:/TEMP/destfoo bar.txt"]))
copy /v "C:/TEMP/sourcefoo bar.txt" "C:/TEMP/destfoo bar.txt"

>>> print(subprocess.list2cmdline(['copy', '/v', r"C:/TEMP/sourcefoo.txt", r"C:/TEMP/destfoo.txt"]))
copy /v C:/TEMP/sourcefoo.txt C:/TEMP/destfoo.txt

It looks like copy‘s argument parsing is okay with backslashes in the path if the path is quoted, but unquoted, it gets confused (probably because the parser is seeing the unprotected forward slashes as introducing weird switches).

A more general rule here is that WinAPI calls are okay with mixed separators, but individual programs’ argument parsing might not be. But again, skip the problem in the first place by using os.path.join, os.sep, the / overload for Path objects, etc., rather than hardcoding paths (unless the paths are so OS-restricted that they can’t exist on a mismatched OS, e.g. in your case, where you’re on Windows, with Windows-only paths, and should just use raw string literals with backslashes).

Answered By: ShadowRanger