How do you sort files numerically?

Question:

I’m processing some files in a directory and need the files to be sorted numerically. I found some examples on sorting—specifically with using the lambda pattern—at wiki.python.org, and I put this together:

import re

file_names = """ayurveda_1.tif
ayurveda_11.tif
ayurveda_13.tif
ayurveda_2.tif
ayurveda_20.tif
ayurveda_22.tif""".split('n')

num_re = re.compile('_(d{1,2}).')

file_names.sort(
    key=lambda fname: int(num_re.search(fname).group(1))
)

Is there a better way to do this?

Asked By: Zach Young

||

Answers:

This is called “natural sorting” or “human sorting” (as opposed to lexicographical sorting, which is the default). Ned B wrote up a quick version of one.

import re

def tryint(s):
    try:
        return int(s)
    except:
        return s

def alphanum_key(s):
    """ Turn a string into a list of string and number chunks.
        "z23a" -> ["z", 23, "a"]
    """
    return [ tryint(c) for c in re.split('([0-9]+)', s) ]

def sort_nicely(l):
    """ Sort the given list in the way that humans expect.
    """
    l.sort(key=alphanum_key)

It’s similar to what you’re doing, but perhaps a bit more generalized.

Answered By: Daniel DiPaolo

If you are using key= in your sort method you shouldn’t use cmp which has been removed from the latest versions of Python. key should be equated to a function which takes a record as input and returns any object which will compare in the order you want your list sorted. It doesn’t need to be a lambda function and might be clearer as a stand alone function. Also regular expressions can be slow to evaluate.

You could try something like the following to isolate and return the integer part of the file name:

def getint(name):
    basename = name.partition('.')
    alpha, num = basename.split('_')
    return int(num)
tiffiles.sort(key=getint)
Answered By: Don O'Donnell

Partition results in Tuple

def getint(name):
    (basename, part, ext) = name.partition('.')
    (alpha, num) = basename.split('_')
    return int(num)
Answered By: Prabhath Kota

Just use :

tiffFiles.sort(key=lambda var:[int(x) if x.isdigit() else x for x in re.findall(r'[^0-9]|[0-9]+', var)])

is faster than use try/except.

Answered By: dkmatt0

This is a modified version of @Don O’Donnell’s answer, because I couldn’t get it working as-is, but I think it’s the best answer here as it’s well-explained.

def getint(name):
    _, num = name.split('_')
    num, _ = num.split('.')
    return int(num)

print(sorted(tiffFiles, key=getint))

Changes:

1) The alpha string doesn’t get stored, as it’s not needed (hence _, num)

2) Use num.split('.') to separate the number from .tiff

3) Use sorted instead of list.sort, per https://docs.python.org/2/howto/sorting.html

Answered By: StatsSorceress

@April provided a good solution in How is Pythons glob.glob ordered? that you could try

#First, get the files:
import glob
import re

files = glob.glob1(img_folder,'*'+output_image_format)

# Sort files according to the digits included in the filename
files = sorted(files, key=lambda x:float(re.findall("(d+)",x)[0]))
Answered By: yoonghm
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.