Find broken symlinks with Python

Question:

If I call os.stat() on a broken symlink, python throws an OSError exception. This makes it useful for finding them. However, there are a few other reasons that os.stat() might throw a similar exception. Is there a more precise way of detecting broken symlinks with Python under Linux?

Asked By: postfuturist

||

Answers:

Can I mention testing for hardlinks without python? /bin/test has the FILE1 -ef FILE2 condition that is true when files share an inode.

Therefore, something like find . -type f -exec test {} -ef /path/to/file ; -print works for hard link testing to a specific file.

Which brings me to reading man test and the mentions of -L and -h which both work on one file and return true if that file is a symbolic link, however that doesn’t tell you if the target is missing.

I did find that head -0 FILE1 would return an exit code of 0 if the file can be opened and a 1 if it cannot, which in the case of a symbolic link to a regular file works as a test for whether it’s target can be read.

Answered By: dlamblin

I’m not a python guy but it looks like os.readlink()? The logic I would use in perl is to use readlink() to find the target and the use stat() to test to see if the target exists.

Edit: I banged out some perl that demos readlink. I believe perl’s stat and readlink and python’s os.stat() and os.readlink()are both wrappers for the system calls, so this should translate reasonable well as proof of concept code:

wembley 0 /home/jj33/swap > cat p
my $f = shift;

while (my $l = readlink($f)) {
  print "$f -> $ln";
  $f = $l;
}

if (!-e $f) {
  print "$f doesn't existn";
}
wembley 0 /home/jj33/swap > ls -l | grep ^l
lrwxrwxrwx    1 jj33  users          17 Aug 21 14:30 link -> non-existant-file
lrwxrwxrwx    1 root     users          31 Oct 10  2007 mm -> ../systems/mm/20071009-rewrite//
lrwxrwxrwx    1 jj33  users           2 Aug 21 14:34 mmm -> mm/
wembley 0 /home/jj33/swap > perl p mm
mm -> ../systems/mm/20071009-rewrite/
wembley 0 /home/jj33/swap > perl p mmm
mmm -> mm
mm -> ../systems/mm/20071009-rewrite/
wembley 0 /home/jj33/swap > perl p link
link -> non-existant-file
non-existant-file doesn't exist
wembley 0 /home/jj33/swap >
Answered By: jj33

os.lstat() may be helpful. If lstat() succeeds and stat() fails, then it’s probably a broken link.

Answered By: Greg Hewgill

os.path

You may try using realpath() to get what the symlink points to, then trying to determine if it’s a valid file using is file.

(I’m not able to try that out at the moment, so you’ll have to play around with it and see what you get)

Answered By: Jason Baker

A common Python saying is that it’s easier to ask forgiveness than permission. While I’m not a fan of this statement in real life, it does apply in a lot of cases. Usually you want to avoid code that chains two system calls on the same file, because you never know what will happen to the file in between your two calls in your code.

A typical mistake is to write something like:

if os.path.exists(path):
    os.unlink(path)

The second call (os.unlink) may fail if something else deleted it after your if test, raise an Exception, and stop the rest of your function from executing. (You might think this doesn’t happen in real life, but we just fished another bug like that out of our codebase last week – and it was the kind of bug that left a few programmers scratching their head and claiming ‘Heisenbug’ for the last few months)

So, in your particular case, I would probably do:

try:
    os.stat(path)
except OSError, e:
    if e.errno == errno.ENOENT:
        print 'path %s does not exist or is a broken symlink' % path
    else:
        raise e

The annoyance here is that stat returns the same error code for a symlink that just isn’t there and a broken symlink.

So, I guess you have no choice than to break the atomicity, and do something like

if not os.path.exists(os.readlink(path)):
    print 'path %s is a broken symlink' % path

This is not atomic but it works.

os.path.islink(filename) and not os.path.exists(filename)

Indeed by RTFM
(reading the fantastic manual) we see

os.path.exists(path)

Return True if path refers to an existing path. Returns False for broken symbolic links.

It also says:

On some platforms, this function may return False if permission is not granted to execute os.stat() on the requested file, even if the path physically exists.

So if you are worried about permissions, you should add other clauses.

Answered By: am70

I had a similar problem: how to catch broken symlinks, even when they occur in some parent dir? I also wanted to log all of them (in an application dealing with a fairly large number of files), but without too many repeats.

Here is what I came up with, including unit tests.

fileutil.py:

import os
from functools import lru_cache
import logging

logger = logging.getLogger(__name__)

@lru_cache(maxsize=2000)
def check_broken_link(filename):
    """
    Check for broken symlinks, either at the file level, or in the
    hierarchy of parent dirs.
    If it finds a broken link, an ERROR message is logged.
    The function is cached, so that the same error messages are not repeated.

    Args:
        filename: file to check

    Returns:
        True if the file (or one of its parents) is a broken symlink.
        False otherwise (i.e. either it exists or not, but no element
        on its path is a broken link).

    """
    if os.path.isfile(filename) or os.path.isdir(filename):
        return False
    if os.path.islink(filename):
        # there is a symlink, but it is dead (pointing nowhere)
        link = os.readlink(filename)
        logger.error('broken symlink: {} -> {}'.format(filename, link))
        return True
    # ok, we have either:
    #   1. a filename that simply doesn't exist (but the containing dir
           does exist), or
    #   2. a broken link in some parent dir
    parent = os.path.dirname(filename)
    if parent == filename:
        # reached root
        return False
    return check_broken_link(parent)

Unit tests:

import logging
import shutil
import tempfile
import os

import unittest
from ..util import fileutil


class TestFile(unittest.TestCase):

    def _mkdir(self, path, create=True):
        d = os.path.join(self.test_dir, path)
        if create:
            os.makedirs(d, exist_ok=True)
        return d

    def _mkfile(self, path, create=True):
        f = os.path.join(self.test_dir, path)
        if create:
            d = os.path.dirname(f)
            os.makedirs(d, exist_ok=True)
            with open(f, mode='w') as fp:
                fp.write('hello')
        return f

    def _mklink(self, target, path):
        f = os.path.join(self.test_dir, path)
        d = os.path.dirname(f)
        os.makedirs(d, exist_ok=True)
        os.symlink(target, f)
        return f

    def setUp(self):
        # reset the lru_cache of check_broken_link
        fileutil.check_broken_link.cache_clear()

        # create a temporary directory for our tests
        self.test_dir = tempfile.mkdtemp()

        # create a small tree of dirs, files, and symlinks
        self._mkfile('a/b/c/foo.txt')
        self._mklink('b', 'a/x')
        self._mklink('b/c/foo.txt', 'a/f')
        self._mklink('../..', 'a/b/c/y')
        self._mklink('not_exist.txt', 'a/b/c/bad_link.txt')
        bad_path = self._mkfile('a/XXX/c/foo.txt', create=False)
        self._mklink(bad_path, 'a/b/c/bad_path.txt')
        self._mklink('not_a_dir', 'a/bad_dir')

    def tearDown(self):
        # Remove the directory after the test
        shutil.rmtree(self.test_dir)

    def catch_check_broken_link(self, expected_errors, expected_result, path):
        filename = self._mkfile(path, create=False)
        with self.assertLogs(level='ERROR') as cm:
            result = fileutil.check_broken_link(filename)
            logging.critical('nothing')  # trick: emit one extra message, so the with assertLogs block doesn't fail
        error_logs = [r for r in cm.records if r.levelname is 'ERROR']
        actual_errors = len(error_logs)
        self.assertEqual(expected_result, result, msg=path)
        self.assertEqual(expected_errors, actual_errors, msg=path)

    def test_check_broken_link_exists(self):
        self.catch_check_broken_link(0, False, 'a/b/c/foo.txt')
        self.catch_check_broken_link(0, False, 'a/x/c/foo.txt')
        self.catch_check_broken_link(0, False, 'a/f')
        self.catch_check_broken_link(0, False, 'a/b/c/y/b/c/y/b/c/foo.txt')

    def test_check_broken_link_notfound(self):
        self.catch_check_broken_link(0, False, 'a/b/c/not_found.txt')

    def test_check_broken_link_badlink(self):
        self.catch_check_broken_link(1, True, 'a/b/c/bad_link.txt')
        self.catch_check_broken_link(0, True, 'a/b/c/bad_link.txt')

    def test_check_broken_link_badpath(self):
        self.catch_check_broken_link(1, True, 'a/b/c/bad_path.txt')
        self.catch_check_broken_link(0, True, 'a/b/c/bad_path.txt')

    def test_check_broken_link_badparent(self):
        self.catch_check_broken_link(1, True, 'a/bad_dir/c/foo.txt')
        self.catch_check_broken_link(0, True, 'a/bad_dir/c/foo.txt')
        # bad link, but shouldn't log a new error:
        self.catch_check_broken_link(0, True, 'a/bad_dir/c')
        # bad link, but shouldn't log a new error:
        self.catch_check_broken_link(0, True, 'a/bad_dir')

if __name__ == '__main__':
    unittest.main()
Answered By: Pierre D

I used this variant, When symlink is broken it will return false for the path.exists and true for path.islink, so combining this two facts we may use the following:

def kek(argum):
    if path.exists("/root/" + argum) == False and path.islink("/root/" + argum) == True:
        print("The path is a broken link, location: " + os.readlink("/root/" + argum))
    else:
        return "No broken links fond"

For Python 3, you can use the pathlib module. From its docs,

If the path points to a symlink, exists() returns whether the symlink points to an existing file or directory.

So this works too.

import pathlib

path = pathlib.Path("/path/to/somewhere")

if path.is_symlink() and not path.exists():
    print(f"found dangling symlink at {path}")
Answered By: Jim
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.