Find the common path prefix of a list of paths

Question:

My problem is to find the common path prefix of a given set of files.

Literally I was expecting that “os.path.commonprefix” would do just that. Unfortunately, the fact that commonprefix is located in path is rather misleading, since it actually will search for string prefixes.

The question to me is, how can this actually be solved for paths? The issue was briefly mentioned in this (fairly high rated) answer but only as a side-note and the proposed solution (appending slashes to the input of commonprefix) imho has issues, since it will fail for instance for:

os.path.commonprefix(['/usr/var1/log/', '/usr/var2/log/'])
# returns /usr/var but it should be /usr

To prevent others from falling into the same trap, it might be worthwhile to discuss this issue in a separate question: Is there a simple / portable solution for this problem that does not rely on nasty checks on the file system (i.e., access the result of commonprefix and check whether it is a directory and if not returns a os.path.dirname of the result)?

Asked By: bluenote10

||

Answers:

Assuming you want the common directory path, one way is to:

  1. Use only directory paths as input. If your input value is a file name, call os.path.dirname(filename) to get its directory path.
  2. “Normalize” all the paths so that they are relative to the same thing and don’t include double separators. The easiest way to do this is by calling os.path.abspath( ) to get the path relative to the root. (You might also want to use os.path.realpath( ) to remove symbolic links.)
  3. Add a final separator (found portably with os.path.sep or os.sep) to the end of all the normalized directory paths.
  4. Call os.path.dirname( ) on the result of os.path.commonprefix( ).

In code (without removing symbolic links):

def common_path(directories):
    norm_paths = [os.path.abspath(p) + os.path.sep for p in directories]
    return os.path.dirname(os.path.commonprefix(norm_paths))

def common_path_of_filenames(filenames):
    return common_path([os.path.dirname(f) for f in filenames])
Answered By: Dan Getz

Awhile ago I ran into this where os.path.commonprefix is a string prefix and not a path prefix as would be expected. So I wrote the following:

def commonprefix(l):
    # this unlike the os.path.commonprefix version
    # always returns path prefixes as it compares
    # path component wise
    cp = []
    ls = [p.split('/') for p in l]
    ml = min( len(p) for p in ls )

    for i in range(ml):

        s = set( p[i] for p in ls )         
        if len(s) != 1:
            break

        cp.append(s.pop())

    return '/'.join(cp)

it could be made more portable by replacing '/' with os.path.sep.

Answered By: Dan D.

A robust approach is to split the path into individual components and then find the longest common prefix of the component lists.

Here is an implementation which is cross-platform and can be generalized easily to more than two paths:

import os.path
import itertools

def components(path):
    '''
    Returns the individual components of the given file path
    string (for the local operating system).

    The returned components, when joined with os.path.join(), point to
    the same location as the original path.
    '''
    components = []
    # The loop guarantees that the returned components can be
    # os.path.joined with the path separator and point to the same
    # location:    
    while True:
        (new_path, tail) = os.path.split(path)  # Works on any platform
        components.append(tail)        
        if new_path == path:  # Root (including drive, on Windows) reached
            break
        path = new_path
    components.append(new_path)

    components.reverse()  # First component first 
    return components

def longest_prefix(iter0, iter1):
    '''
    Returns the longest common prefix of the given two iterables.
    '''
    longest_prefix = []
    for (elmt0, elmt1) in itertools.izip(iter0, iter1):
        if elmt0 != elmt1:
            break
        longest_prefix.append(elmt0)
    return longest_prefix

def common_prefix_path(path0, path1):
    return os.path.join(*longest_prefix(components(path0), components(path1)))

# For Unix:
assert common_prefix_path('/', '/usr') == '/'
assert common_prefix_path('/usr/var1/log/', '/usr/var2/log/') == '/usr'
assert common_prefix_path('/usr/var/log1/', '/usr/var/log2/') == '/usr/var'
assert common_prefix_path('/usr/var/log', '/usr/var/log2') == '/usr/var'
assert common_prefix_path('/usr/var/log', '/usr/var/log') == '/usr/var/log'
# Only for Windows:
# assert common_prefix_path(r'C:ProgramsMe', r'C:Programs') == r'C:Programs'
Answered By: Eric O Lebigot

I’ve made a small python package commonpath to find common paths from a list. Comes with a few nice options.

https://github.com/faph/Common-Path

Answered By: faph

It seems that this issue has been corrected in recent versions of Python. New in version 3.5 is the function os.path.commonpath(), which returns the common path instead of the common string prefix.

Answered By: cjac
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.