cross-platform splitting of path in python

Question:

I’d like something that has the same effect as this:

>>> path = "/foo/bar/baz/file"
>>> path_split = path.rsplit('/')[1:]
>>> path_split
['foo', 'bar', 'baz', 'file']

But that will work with Windows paths too. I know that there is an os.path.split() but that doesn’t do what I want, and I didn’t see anything that does.

Asked By: Aaron Yodaiken

||

Answers:

Use the functionality provided in os.path, e.g.

os.path.split(path)

Like written elsewhere you can call it multiple times to split longer paths.

Answered By: Benjamin Bannier

Someone said "use os.path.split". This got deleted unfortunately, but it is the right answer.

os.path.split(path)

Split the pathname path into a pair, (head, tail) where tail is the last pathname component and head is everything leading up to that. The tail part will never contain a slash; if path ends in a slash, tail will be empty. If there is no slash in path, head will be empty. If path is empty, both head and tail are empty. Trailing slashes are stripped from head unless it is the root (one or more slashes only). In all cases, join(head, tail) returns a path to the same location as path (but the strings may differ).

So it’s not just splitting the dirname and filename. You can apply it several times to get the full path in a portable and correct way. Code sample:

dirname = path
path_split = []
while True:
    dirname, leaf = split(dirname)
    if leaf:
        path_split = [leaf] + path_split #Adds one element, at the beginning of the list
    else:
        #Uncomment the following line to have also the drive, in the format "Z:"
        #path_split = [dirname] + path_split 
        break

Please credit the original author if that answer gets undeleted.

Answered By: Kos

Use the functionality provided in os.path, e.g.

os.path.split(path)

(This answer was by someone else and was mysteriously and incorrectly deleted, since it’s a working answer; if you want to split each part of the path apart, you can call it multiple times, and each call will pull a component off of the end.)

Answered By: Glenn Maynard

So keep using os.path.split until you get to what you want. Here’s an ugly implementation using an infinite loop:

import os.path
def parts(path):
    components = [] 
    while True:
        (path,tail) = os.path.split(path)
        if tail == "":
            components.reverse()
            return components
        components.append(tail)

Stick that in parts.py, import parts, and voila:

>>> parts.parts("foo/bar/baz/loop")
['foo', 'bar', 'baz', 'loop']

Probably a nicer implementation using generators or recursion out there…

Answered By: Spacedman

The OP specified “will work with Windows paths too”. There are a few wrinkles with Windows paths.

Firstly, Windows has the concept of multiple drives, each with its own current working directory, and 'c:foo' and 'c:\foo' are often not the same. Consequently it is a very good idea to separate out any drive designator first, using os.path.splitdrive(). Then reassembling the path (if required) can be done correctly by
drive + os.path.join(*other_pieces)

Secondly, Windows paths can contain slashes or backslashes or a mixture. Consequently, using os.sep when parsing an unnormalised path is not useful.

More generally:

The results produced for 'foo' and 'foo/' should not be identical.

The loop termination condition seems to be best expressed as “os.path.split() treated its input as unsplittable”.

Here’s a suggested solution, with tests, including a comparison with @Spacedman’s solution

import os.path

def os_path_split_asunder(path, debug=False):
    parts = []
    while True:
        newpath, tail = os.path.split(path)
        if debug: print repr(path), (newpath, tail)
        if newpath == path:
            assert not tail
            if path: parts.append(path)
            break
        parts.append(tail)
        path = newpath
    parts.reverse()
    return parts

def spacedman_parts(path):
    components = [] 
    while True:
        (path,tail) = os.path.split(path)
        if not tail:
            return components
        components.insert(0,tail)

if __name__ == "__main__":
    tests = [
        '',
        'foo',
        'foo/',
        'foo\',
        '/foo',
        '\foo',
        'foo/bar',
        '/',
        'c:',
        'c:/',
        'c:foo',
        'c:/foo',
        'c:/users/john/foo.txt',
        '/users/john/foo.txt',
        'foo/bar/baz/loop',
        'foo/bar/baz/',
        '//hostname/foo/bar.txt',
        ]
    for i, test in enumerate(tests):
        print "nTest %d: %r" % (i, test)
        drive, path = os.path.splitdrive(test)
        print 'drive, path', repr(drive), repr(path)
        a = os_path_split_asunder(path)
        b = spacedman_parts(path)
        print "a ... %r" % a
        print "b ... %r" % b
        print a == b

and here’s the output (Python 2.7.1, Windows 7 Pro):

Test 0: ''
drive, path '' ''
a ... []
b ... []
True

Test 1: 'foo'
drive, path '' 'foo'
a ... ['foo']
b ... ['foo']
True

Test 2: 'foo/'
drive, path '' 'foo/'
a ... ['foo', '']
b ... []
False

Test 3: 'foo\'
drive, path '' 'foo\'
a ... ['foo', '']
b ... []
False

Test 4: '/foo'
drive, path '' '/foo'
a ... ['/', 'foo']
b ... ['foo']
False

Test 5: '\foo'
drive, path '' '\foo'
a ... ['\', 'foo']
b ... ['foo']
False

Test 6: 'foo/bar'
drive, path '' 'foo/bar'
a ... ['foo', 'bar']
b ... ['foo', 'bar']
True

Test 7: '/'
drive, path '' '/'
a ... ['/']
b ... []
False

Test 8: 'c:'
drive, path 'c:' ''
a ... []
b ... []
True

Test 9: 'c:/'
drive, path 'c:' '/'
a ... ['/']
b ... []
False

Test 10: 'c:foo'
drive, path 'c:' 'foo'
a ... ['foo']
b ... ['foo']
True

Test 11: 'c:/foo'
drive, path 'c:' '/foo'
a ... ['/', 'foo']
b ... ['foo']
False

Test 12: 'c:/users/john/foo.txt'
drive, path 'c:' '/users/john/foo.txt'
a ... ['/', 'users', 'john', 'foo.txt']
b ... ['users', 'john', 'foo.txt']
False

Test 13: '/users/john/foo.txt'
drive, path '' '/users/john/foo.txt'
a ... ['/', 'users', 'john', 'foo.txt']
b ... ['users', 'john', 'foo.txt']
False

Test 14: 'foo/bar/baz/loop'
drive, path '' 'foo/bar/baz/loop'
a ... ['foo', 'bar', 'baz', 'loop']
b ... ['foo', 'bar', 'baz', 'loop']
True

Test 15: 'foo/bar/baz/'
drive, path '' 'foo/bar/baz/'
a ... ['foo', 'bar', 'baz', '']
b ... []
False

Test 16: '//hostname/foo/bar.txt'
drive, path '' '//hostname/foo/bar.txt'
a ... ['//', 'hostname', 'foo', 'bar.txt']
b ... ['hostname', 'foo', 'bar.txt']
False
Answered By: John Machin

One more try with maxplit option, which is a replacement for os.path.split()

def pathsplit(pathstr, maxsplit=1):
    """split relative path into list"""
    path = [pathstr]
    while True:
        oldpath = path[:]
        path[:1] = list(os.path.split(path[0]))
        if path[0] == '':
            path = path[1:]
        elif path[1] == '':
            path = path[:1] + path[2:]
        if path == oldpath:
            return path
        if maxsplit is not None and len(path) > maxsplit:
            return path
Answered By: anatoly techtonik

Here’s an explicit implementation of the approach that just iteratively
uses os.path.split; uses a slightly different loop termination condition than the accepted answer.

def splitpath(path):
    parts=[]
    (path, tail)=os.path.split( path)
    while path and tail:
         parts.append( tail)
         (path,tail)=os.path.split(path)
    parts.append( os.path.join(path,tail) )
    return map( os.path.normpath, parts)[::-1]

This should satisfy os.path.join( *splitpath(path) ) is path
in the sense that they both indicate the same file/directory.

Tested in linux:

In [51]: current='/home/dave/src/python'

In [52]: splitpath(current)
Out[52]: ['/', 'home', 'dave', 'src', 'python'] 

In [53]: splitpath(current[1:])
Out[53]: ['.', 'dave', 'src', 'python']

In [54]: splitpath( os.path.join(current, 'module.py'))
Out[54]: ['/', 'home', 'dave', 'src', 'python', 'module.py']

In [55]: splitpath( os.path.join(current[1:], 'module.py'))
Out[55]: ['.', 'dave', 'src', 'python', 'module.py']

I hand checked a few of the DOS paths, using the by replacing os.path with ntpath module, look OK to me, but I’m not too familiar with the ins and outs of DOS paths.

Answered By: Dave

Python 3.4 introduced a new module pathlib. pathlib.Path provides file system related methods, while pathlib.PurePath operates completely independent of the file system:

>>> from pathlib import PurePath
>>> path = "/foo/bar/baz/file"
>>> path_split = PurePath(path).parts
>>> path_split
('\', 'foo', 'bar', 'baz', 'file')

You can use PosixPath and WindowsPath explicitly when desired:

>>> from pathlib import PureWindowsPath, PurePosixPath
>>> PureWindowsPath(path).parts
('\', 'foo', 'bar', 'baz', 'file')
>>> PurePosixPath(path).parts
('/', 'foo', 'bar', 'baz', 'file')

And of course, it works with Windows paths as well:

>>> wpath = r"C:foobarbazfile"
>>> PurePath(wpath).parts
('C:\', 'foo', 'bar', 'baz', 'file')
>>> PureWindowsPath(wpath).parts
('C:\', 'foo', 'bar', 'baz', 'file')
>>> PurePosixPath(wpath).parts
('C:\foo\bar\baz\file',)
>>>
>>> wpath = r"C:foo/bar/baz/file"
>>> PurePath(wpath).parts
('C:\', 'foo', 'bar', 'baz', 'file')
>>> PureWindowsPath(wpath).parts
('C:\', 'foo', 'bar', 'baz', 'file')
>>> PurePosixPath(wpath).parts
('C:\foo', 'bar', 'baz', 'file')

Huzzah for Python devs constantly improving the language!

Answered By: John Crawford
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.