# Is there a built in function for string natural sort?

## Question:

I have a list of strings for which I would like to perform a natural alphabetical sort.

For instance, the following list is naturally sorted (what I want):

``````['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
``````

And here’s the "sorted" version of the above list (what I get using `sorted()`):

``````['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']
``````

I’m looking for a sort function which behaves like the first one.

Try this:

``````import re

def natural_sort(l):
convert = lambda text: int(text) if text.isdigit() else text.lower()
alphanum_key = lambda key: [convert(c) for c in re.split('([0-9]+)', key)]
return sorted(l, key=alphanum_key)
``````

Output:

``````['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
``````

Code adapted from here: Sorting for Humans : Natural Sort Order.

``````>>> import re
>>> sorted(lst, key=lambda x: int(re.findall(r'd+\$', x)[0]))
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
``````

One option is to turn the string into a tuple and replace digits using expanded form http://wiki.answers.com/Q/What_does_expanded_form_mean

that way a90 would become (“a”,90,0) and a1 would become (“a”,1)

below is some sample code (which isn’t very efficient due to the way It removes leading 0’s from numbers)

``````alist=["something1",
"something12",
"something17",
"something2",
"something25and_then_33",
"something25and_then_34",
"something29",
"beta1.1",
"beta2.3.0",
"beta2.33.1",
"a001",
"a2",
"z002",
"z1"]

def key(k):
nums=set(list("0123456789"))
chars=set(list(k))
chars=chars-nums
for i in range(len(k)):
for c in chars:
k=k.replace(c+"0",c)
l=list(k)
base=10
j=0
for i in range(len(l)-1,-1,-1):
try:
l[i]=int(l[i])*base**j
j+=1
except:
j=0
l=tuple(l)
print l
return l

print sorted(alist,key=key)
``````

output:

``````('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 1)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 10, 2)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 10, 7)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 2)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 5, 'a', 'n', 'd', '_', 't', 'h', 'e', 'n', '_', 30, 3)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 5, 'a', 'n', 'd', '_', 't', 'h', 'e', 'n', '_', 30, 4)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 9)
('b', 'e', 't', 'a', 1, '.', 1)
('b', 'e', 't', 'a', 2, '.', 3, '.')
('b', 'e', 't', 'a', 2, '.', 30, 3, '.', 1)
('a', 1)
('a', 2)
('z', 2)
('z', 1)
['a001', 'a2', 'beta1.1', 'beta2.3.0', 'beta2.33.1', 'something1', 'something2', 'something12', 'something17', 'something25and_then_33', 'something25and_then_34', 'something29', 'z1', 'z002']
``````

I wrote a function based on http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html which adds the ability to still pass in your own ‘key’ parameter. I need this in order to perform a natural sort of lists that contain more complex objects (not just strings).

``````import re

def natural_sort(list, key=lambda s:s):
"""
Sort the list into natural alphanumeric order.
"""
def get_alphanum_key_func(key):
convert = lambda text: int(text) if text.isdigit() else text
return lambda s: [convert(c) for c in re.split('([0-9]+)', key(s))]
sort_key = get_alphanum_key_func(key)
list.sort(key=sort_key)
``````

For example:

``````my_list = [{'name':'b'}, {'name':'10'}, {'name':'a'}, {'name':'1'}, {'name':'9'}]
natural_sort(my_list, key=lambda x: x['name'])
print my_list
[{'name': '1'}, {'name': '9'}, {'name': '10'}, {'name': 'a'}, {'name': 'b'}]
``````

Here’s a much more pythonic version of Mark Byer’s answer:

``````import re

def natural_sort_key(s, _nsre=re.compile('([0-9]+)')):
return [int(text) if text.isdigit() else text.lower()
for text in _nsre.split(s)]
``````

Now this function can be used as a key in any function that uses it, like `list.sort`, `sorted`, `max`, etc.

As a lambda:

``````lambda s: [int(t) if t.isdigit() else t.lower() for t in re.split('(d+)', s)]
``````

Fully reproducible demo code:

``````import re
natsort = lambda s: [int(t) if t.isdigit() else t.lower() for t in re.split('(d+)', s)]
L = ["a1", "a10", "a11", "a2", "a22", "a3"]
print(sorted(L, key=natsort))
# ['a1', 'a2', 'a3', 'a10', 'a11', 'a22']
``````

There is a third party library for this on PyPI called natsort (full disclosure, I am the package’s author). For your case, you can do either of the following:

``````>>> from natsort import natsorted, ns
>>> x = ['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']
>>> natsorted(x, key=lambda y: y.lower())
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> natsorted(x, alg=ns.IGNORECASE)  # or alg=ns.IC
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
``````

You should note that `natsort` uses a general algorithm so it should work for just about any input that you throw at it. If you want more details on why you might choose a library to do this rather than rolling your own function, check out the `natsort` documentation’s How It Works page, in particular the Special Cases Everywhere! section.

If you need a sorting key instead of a sorting function, use either of the below formulas.

``````>>> from natsort import natsort_keygen, ns
>>> l1 = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> l2 = l1[:]
>>> natsort_key1 = natsort_keygen(key=lambda y: y.lower())
>>> l1.sort(key=natsort_key1)
>>> l1
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> natsort_key2 = natsort_keygen(alg=ns.IGNORECASE)
>>> l2.sort(key=natsort_key2)
>>> l2
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
``````

Update November 2020

Given that a popular request/question is "how to sort like Windows Explorer?" (or whatever is your operating system’s file system browser), as of `natsort` version 7.1.0 there is a function called `os_sorted` to do exactly this. On Windows, it will sort in the same order as Windows Explorer, and on other operating systems it should sort like whatever is the local file system browser.

``````>>> from natsort import os_sorted
>>> os_sorted(list_of_paths)
``````

For those needing a sort key, you can use `os_sort_keygen` (or `os_sort_key` if you just need the defaults).

Caveat – Please read the API documentation for this function before you use to understand the limitations and how to get best results.

The above answers are good for the specific example that was shown, but miss several useful cases for the more general question of natural sort. I just got bit by one of those cases, so created a more thorough solution:

``````def natural_sort_key(string_or_number):
"""
by Scott S. Lawton <[email protected]> 2014-12-11; public domain and/or CC0 license

handles cases where simple 'int' approach fails, e.g.
['0.501', '0.55'] floating point with different number of significant digits
[0.01, 0.1, 1]    already numeric so regex and other string functions won't work (and aren't required)
['elm1', 'Elm2']  ASCII vs. letters (not case sensitive)
"""

def try_float(astring):
try:
return float(astring)
except:
return astring

if isinstance(string_or_number, basestring):
string_or_number = string_or_number.lower()

if len(re.findall('[.]d', string_or_number)) <= 1:
# assume a floating point value, e.g. to correctly sort ['0.501', '0.55']
# '.' for decimal is locale-specific, e.g. correct for the Anglosphere and Asia but not continental Europe
return [try_float(s) for s in re.split(r'([d.]+)', string_or_number)]
else:
# assume distinct fields, e.g. IP address, phone number with '.', etc.
# caveat: might want to first split by whitespace
# TBD: for unicode, replace isdigit with isdecimal
return [int(s) if s.isdigit() else s for s in re.split(r'(d+)', string_or_number)]
else:
# consider: add code to recurse for lists/tuples and perhaps other iterables
return string_or_number
``````

Test code and several links (on and off of StackOverflow) are here:
http://productarchitect.com/code/better-natural-sort.py

Feedback welcome. That’s not meant to be a definitive solution; just a step forward.

``````data = ['elm13', 'elm9', 'elm0', 'elm1', 'Elm11', 'Elm2', 'elm10']
``````

Let’s analyse the data. The digit capacity of all elements is 2. And there are 3 letters in common literal part `'elm'`.

So, the maximal length of element is 5. We can increase this value to make sure (for example, to 8).

Bearing that in mind, we’ve got a one-line solution:

``````data.sort(key=lambda x: '{0:0>8}'.format(x).lower())
``````

without regular expressions and external libraries!

``````print(data)

>>> ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'elm13']
``````

Explanation:

``````for elm in data:
print('{0:0>8}'.format(elm).lower())

>>>
0000elm0
0000elm1
0000elm2
0000elm9
000elm10
000elm11
000elm13
``````
And now for something more* elegant (pythonic) -just a touch

There are many implementations out there, and while some have come close, none quite captured the elegance modern python affords.

• Tested using python(3.5.1)
• Included an additional list to demonstrate that it works when the
numbers are mid string
• Didn’t test, however, I am assuming that if your list was sizable it would be more efficient to compile the regex beforehand
• I’m sure someone will correct me if this is an erroneous assumption

Quicky

``````from re import compile, split
dre = compile(r'(d+)')
mylist.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])
``````

Full-Code

``````#!/usr/bin/python3
# coding=utf-8
"""
Natural-Sort Test
"""

from re import compile, split

dre = compile(r'(d+)')
mylist = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13', 'elm']
mylist2 = ['e0lm', 'e1lm', 'E2lm', 'e9lm', 'e10lm', 'E12lm', 'e13lm', 'elm', 'e01lm']

mylist.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])
mylist2.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])

print(mylist)
# ['elm', 'elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
print(mylist2)
# ['e0lm', 'e1lm', 'e01lm', 'E2lm', 'e9lm', 'e10lm', 'E12lm', 'e13lm', 'elm']
``````

Caution when using

• `from os.path import split`
• you will need to differentiate the imports

Inspiration from

Most likely `functools.cmp_to_key()` is closely tied to the underlying implementation of python’s sort. Besides, the cmp parameter is legacy. The modern way is to transform the input items into objects that support the desired rich comparison operations.

Under CPython 2.x, objects of disparate types can be ordered even if the respective rich comparison operators haven’t been implemented. Under CPython 3.x, objects of different types must explicitly support the comparison. See How does Python compare string and int? which links to the official documentation. Most of the answers depend on this implicit ordering. Switching to Python 3.x will require a new type to implement and unify comparisons between numbers and strings.

``````Python 2.7.12 (default, Sep 29 2016, 13:30:34)
>>> (0,"foo") < ("foo",0)
True
``````
``````Python 3.5.2 (default, Oct 14 2016, 12:54:53)
>>> (0,"foo") < ("foo",0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < str()
``````

There are three different approaches. The first uses nested classes to take advantage of Python’s `Iterable` comparison algorithm. The second unrolls this nesting into a single class. The third foregoes subclassing `str` to focus on performance. All are timed; the second is twice as fast while the third almost six times faster. Subclassing `str` isn’t required, and was probably a bad idea in the first place, but it does come with certain conveniences.

The sort characters are duplicated to force ordering by case, and case-swapped to force lower case letter to sort first; this is the typical definition of “natural sort”. I couldn’t decide on the type of grouping; some might prefer the following, which also brings significant performance benefits:

``````d = lambda s: s.lower()+s.swapcase()
``````

Where utilized, the comparison operators are set to that of `object` so they won’t be ignored by `functools.total_ordering`.

``````import functools
import itertools

@functools.total_ordering
class NaturalStringA(str):
def __repr__(self):
return "{}({})".format
( type(self).__name__
, super().__repr__()
)
d = lambda c, s: [ c.NaturalStringPart("".join(v))
for k,v in
itertools.groupby(s, c.isdigit)
]
d = classmethod(d)
@functools.total_ordering
class NaturalStringPart(str):
d = lambda s: "".join(c.lower()+c.swapcase() for c in s)
d = staticmethod(d)
def __lt__(self, other):
if not isinstance(self, type(other)):
return NotImplemented
try:
return int(self) < int(other)
except ValueError:
if self.isdigit():
return True
elif other.isdigit():
return False
else:
return self.d(self) < self.d(other)
def __eq__(self, other):
if not isinstance(self, type(other)):
return NotImplemented
try:
return int(self) == int(other)
except ValueError:
if self.isdigit() or other.isdigit():
return False
else:
return self.d(self) == self.d(other)
__le__ = object.__le__
__ne__ = object.__ne__
__gt__ = object.__gt__
__ge__ = object.__ge__
def __lt__(self, other):
return self.d(self) < self.d(other)
def __eq__(self, other):
return self.d(self) == self.d(other)
__le__ = object.__le__
__ne__ = object.__ne__
__gt__ = object.__gt__
__ge__ = object.__ge__
``````
``````import functools
import itertools

@functools.total_ordering
class NaturalStringB(str):
def __repr__(self):
return "{}({})".format
( type(self).__name__
, super().__repr__()
)
d = lambda s: "".join(c.lower()+c.swapcase() for c in s)
d = staticmethod(d)
def __lt__(self, other):
if not isinstance(self, type(other)):
return NotImplemented
groups = map(lambda i: itertools.groupby(i, type(self).isdigit), (self, other))
zipped = itertools.zip_longest(*groups)
for s,o in zipped:
if s is None:
return True
if o is None:
return False
s_k, s_v = s[0], "".join(s[1])
o_k, o_v = o[0], "".join(o[1])
if s_k and o_k:
s_v, o_v = int(s_v), int(o_v)
if s_v == o_v:
continue
return s_v < o_v
elif s_k:
return True
elif o_k:
return False
else:
s_v, o_v = self.d(s_v), self.d(o_v)
if s_v == o_v:
continue
return s_v < o_v
return False
def __eq__(self, other):
if not isinstance(self, type(other)):
return NotImplemented
groups = map(lambda i: itertools.groupby(i, type(self).isdigit), (self, other))
zipped = itertools.zip_longest(*groups)
for s,o in zipped:
if s is None or o is None:
return False
s_k, s_v = s[0], "".join(s[1])
o_k, o_v = o[0], "".join(o[1])
if s_k and o_k:
s_v, o_v = int(s_v), int(o_v)
if s_v == o_v:
continue
return False
elif s_k or o_k:
return False
else:
s_v, o_v = self.d(s_v), self.d(o_v)
if s_v == o_v:
continue
return False
return True
__le__ = object.__le__
__ne__ = object.__ne__
__gt__ = object.__gt__
__ge__ = object.__ge__
``````
``````import functools
import itertools
import enum

class OrderingType(enum.Enum):
PerWordSwapCase         = lambda s: s.lower()+s.swapcase()
PerCharacterSwapCase    = lambda s: "".join(c.lower()+c.swapcase() for c in s)

class NaturalOrdering:
@classmethod
def by(cls, ordering):
def wrapper(string):
return cls(string, ordering)
return wrapper
def __init__(self, string, ordering=OrderingType.PerCharacterSwapCase):
self.string = string
self.groups = [ (k,int("".join(v)))
if k else
(k,ordering("".join(v)))
for k,v in
itertools.groupby(string, str.isdigit)
]
def __repr__(self):
return "{}({})".format
( type(self).__name__
, self.string
)
def __lesser(self, other, default):
if not isinstance(self, type(other)):
return NotImplemented
for s,o in itertools.zip_longest(self.groups, other.groups):
if s is None:
return True
if o is None:
return False
s_k, s_v = s
o_k, o_v = o
if s_k and o_k:
if s_v == o_v:
continue
return s_v < o_v
elif s_k:
return True
elif o_k:
return False
else:
if s_v == o_v:
continue
return s_v < o_v
return default
def __lt__(self, other):
return self.__lesser(other, default=False)
def __le__(self, other):
return self.__lesser(other, default=True)
def __eq__(self, other):
if not isinstance(self, type(other)):
return NotImplemented
for s,o in itertools.zip_longest(self.groups, other.groups):
if s is None or o is None:
return False
s_k, s_v = s
o_k, o_v = o
if s_k and o_k:
if s_v == o_v:
continue
return False
elif s_k or o_k:
return False
else:
if s_v == o_v:
continue
return False
return True
# functools.total_ordering doesn't create single-call wrappers if both
# __le__ and __lt__ exist, so do it manually.
def __gt__(self, other):
op_result = self.__le__(other)
if op_result is NotImplemented:
return op_result
return not op_result
def __ge__(self, other):
op_result = self.__lt__(other)
if op_result is NotImplemented:
return op_result
return not op_result
# __ne__ is the only implied ordering relationship, it automatically
# delegates to __eq__
``````
``````>>> import natsort
>>> import timeit
>>> l1 = ['Apple', 'corn', 'apPlE', 'arbour', 'Corn', 'Banana', 'apple', 'banana']
>>> l2 = list(map(str, range(30)))
>>> l3 = ["{} {}".format(x,y) for x in l1 for y in l2]
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalStringA)', number=10000, globals=globals()))
362.4729259099986
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalStringB)', number=10000, globals=globals()))
189.7340817489967
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalOrdering.by(OrderingType.PerCharacterSwapCase))', number=10000, globals=globals()))
69.34636392899847
>>> print(timeit.timeit('natsort.natsorted(l3+["0"], alg=natsort.ns.GROUPLETTERS | natsort.ns.LOWERCASEFIRST)', number=10000, globals=globals()))
98.2531585780016
``````

Natural sorting is both pretty complicated and vaguely defined as a problem. Don’t forget to run `unicodedata.normalize(...)` beforehand, and consider use `str.casefold()` rather than `str.lower()`. There are probably subtle encoding issues I haven’t considered. So I tentatively recommend the natsort library. I took a quick glance at the github repository; the code maintenance has been stellar.

All the algorithms I’ve seen depend on tricks such as duplicating and lowering characters, and swapping case. While this doubles the running time, an alternative would require a total natural ordering on the input character set. I don’t think this is part of the unicode specification, and since there are many more unicode digits than `[0-9]`, creating such a sorting would be equally daunting. If you want locale-aware comparisons, prepare your strings with `locale.strxfrm` per Python’s Sorting HOW TO.

Given:

``````data = ['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']
``````

Similar to SergO’s solution, a 1-liner without external libraries would be:

``````data.sort(key=lambda x: int(x[3:]))
``````

or

``````sorted_data = sorted(data, key=lambda x: int(x[3:]))
``````

Explanation:

This solution uses the key feature of sort to define a function that will be employed for the sorting. Because we know that every data entry is preceded by ‘elm’ the sorting function converts to integer the portion of the string after the 3rd character (i.e. int(x[3:])). If the numerical part of the data is in a different location, then this part of the function would have to change.

I suggest you simply use the `key` keyword argument of `sorted` to achieve your desired list
For example:

``````to_order= [e2,E1,e5,E4,e3]
ordered= sorted(to_order, key= lambda x: x.lower())
# ordered should be [E1,e2,e3,E4,e5]
``````

Based on the answers here, I wrote a `natural_sorted` function that behaves like the built-in function `sorted`:

``````# Copyright (C) 2018, Benjamin Drung <[email protected]>
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

import re

def natural_sorted(iterable, key=None, reverse=False):
"""Return a new naturally sorted list from the items in *iterable*.

The returned list is in natural sort order. The string is ordered
lexicographically (using the Unicode code point number to order individual
characters), except that multi-digit numbers are ordered as a single
character.

Has two optional arguments which must be specified as keyword arguments.

*key* specifies a function of one argument that is used to extract a
comparison key from each list element: ``key=str.lower``.  The default value
is ``None`` (compare the elements directly).

*reverse* is a boolean value.  If set to ``True``, then the list elements are
sorted as if each comparison were reversed.

The :func:`natural_sorted` function is guaranteed to be stable. A sort is
stable if it guarantees not to change the relative order of elements that
compare equal --- this is helpful for sorting in multiple passes (for
example, sort by department, then by salary grade).
"""
prog = re.compile(r"(d+)")

def alphanum_key(element):
"""Split given key in list of strings and digits"""
return [int(c) if c.isdigit() else c for c in prog.split(key(element)
if key else element)]

return sorted(iterable, key=alphanum_key, reverse=reverse)
``````

The source code is also available in my GitHub snippets repository:
https://github.com/bdrung/snippets/blob/master/natural_sorted.py

# Value Of This Post

My point is to offer a non regex solution that can be applied generally.
I’ll create three functions:

1. `find_first_digit` which I borrowed from @AnuragUniyal. It will find the position of the first digit or non-digit in a string.
2. `split_digits` which is a generator that picks apart a string into digit and non digit chunks. It will also `yield` integers when it is a digit.
3. `natural_key` just wraps `split_digits` into a `tuple`. This is what we use as a key for `sorted`, `max`, `min`.

# Functions

``````def find_first_digit(s, non=False):
for i, x in enumerate(s):
if x.isdigit() ^ non:
return i
return -1

def split_digits(s, case=False):
non = True
while s:
i = find_first_digit(s, non)
if i == 0:
non = not non
elif i == -1:
yield int(s) if s.isdigit() else s if case else s.lower()
s = ''
else:
x, s = s[:i], s[i:]
yield int(x) if x.isdigit() else x if case else x.lower()

def natural_key(s, *args, **kwargs):
return tuple(split_digits(s, *args, **kwargs))
``````

We can see that it is general in that we can have multiple digit chunks:

``````# Note that the key has lower case letters
natural_key('asl;dkfDFKJ:sdlkfjdf809lkasdjfa_543_hh')

('asl;dkfdfkj:sdlkfjdf', 809, 'lkasdjfa_', 543, '_hh')
``````

Or leave as case sensitive:

``````natural_key('asl;dkfDFKJ:sdlkfjdf809lkasdjfa_543_hh', True)

('asl;dkfDFKJ:sdlkfjdf', 809, 'lkasdjfa_', 543, '_hh')
``````

We can see that it sorts the OP’s list in the appropriate order

``````sorted(
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13'],
key=natural_key
)

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
``````

But it can handle more complicated lists as well:

``````sorted(
['f_1', 'e_1', 'a_2', 'g_0', 'd_0_12:2', 'd_0_1_:2'],
key=natural_key
)

['a_2', 'd_0_1_:2', 'd_0_12:2', 'e_1', 'f_1', 'g_0']
``````

My regex equivalent would be

``````def int_maybe(x):
return int(x) if str(x).isdigit() else x

def split_digits_re(s, case=False):
parts = re.findall('d+|D+', s)
if not case:
return map(int_maybe, (x.lower() for x in parts))
else:
return map(int_maybe, parts)

def natural_key_re(s, *args, **kwargs):
return tuple(split_digits_re(s, *args, **kwargs))
``````
``````a = ['H1', 'H100', 'H10', 'H3', 'H2', 'H6', 'H11', 'H50', 'H5', 'H99', 'H8']
b = ''
c = []

sorted = False

while not sorted:
sorted = True
for i in range(length):
sorted = False
a[i], a[i+1] = a[i+1], a[i] #sort the main list based on the integer list index value

for a_string in a: #extract the number in the string character by character
for letter in a_string:
if letter.isdigit():
#print letter
b += letter
c.append(b)
b = ''

print 'Before sorting....'
print a
c = map(int, c) #converting string list into number list
print c
bubble(c)

print 'After sorting....'
print c
print a
``````

Acknowledgments:

Bubble Sort Homework

How to read a string one letter at a time in python

Following @Mark Byers answer, here is an adaptation which accepts the `key` parameter, and is more PEP8-compliant.

``````def natsorted(seq, key=None):
def convert(text):
return int(text) if text.isdigit() else text

def alphanum(obj):
if key is not None:
return [convert(c) for c in re.split(r'([0-9]+)', key(obj))]
return [convert(c) for c in re.split(r'([0-9]+)', obj)]

return sorted(seq, key=alphanum)
``````

An improvement on Claudiu’s improvement on Mark Byers’ answer ðŸ˜‰

``````import re

def natural_sort_key(s, _re=re.compile(r'(d+)')):
return [int(t) if i & 1 else t.lower() for i, t in enumerate(_re.split(s))]

...
my_naturally_sorted_list = sorted(my_list, key=natural_sort_key)
``````

BTW, maybe not everyone remembers that function argument defaults are evaluated at `def` time

Let me submit my own take on this need:

``````from typing import Tuple, Union, Optional, Generator

StrOrInt = Union[str, int]

# On Python 3.6, string concatenation is REALLY fast
# Tested myself, and this fella also tested:
# https://blog.ganssle.io/articles/2019/11/string-concat.html
def griter(s: str) -> Generator[StrOrInt, None, None]:
last_was_digit: Optional[bool] = None
cluster: str = ""
for c in s:
if last_was_digit is None:
last_was_digit = c.isdigit()
cluster += c
continue
if c.isdigit() != last_was_digit:
if last_was_digit:
yield int(cluster)
else:
yield cluster
last_was_digit = c.isdigit()
cluster = ""
cluster += c
if last_was_digit:
yield int(cluster)
else:
yield cluster
return

def grouper(s: str) -> Tuple[StrOrInt, ...]:
return tuple(griter(s))
``````

Now if we have the list like such:

``````filelist = [
'File3', 'File007', 'File3a', 'File10', 'File11', 'File1', 'File4', 'File5',
'File9', 'File8', 'File8b1', 'File8b2', 'File8b11', 'File6'
]
``````

We can simply use the `key=` kwarg to do a natural sort:

``````>>> sorted(filelist, key=grouper)
['File1', 'File3', 'File3a', 'File4', 'File5', 'File6', 'File007', 'File8',
'File8b1', 'File8b2', 'File8b11', 'File9', 'File10', 'File11']
``````

The drawback here is of course, as it is now, the function will sort uppercase letters before lowercase letters.

I’ll leave the implementation of a case-insenstive grouper to the reader ðŸ™‚

The algorithm I use is `padzero_with_lower` as defined as:

``````import re

return re.sub(r'd+', lambda m: m.group(0).rjust(10, '0'), s).lower()
``````

The algorithm finds:

• finds and pads numbers of any length, to a large enough length, e.g. 10
• then, it turns the string into lower case

Here’s an example usage:

``````print(padzero_with_lower('file1.txt'))   # file0000000001.txt
``````

With this function tested, we can now use it as our key, i.e.

``````lis = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
print(lis)
# Output: ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
``````

Just for the records, here is yet another variant of Mark Byers’ simple solution, similar to the one suggested by Walter Tross, which avoids calling `isdigit()`. This not only makes it faster, but also avoids the problems that can occur because `isdigit()` considers more unicode chars as digits than the the regex `d+`.

``````import re
from itertools import cycle

_re_digits = re.compile(r"(d+)")

def natural_comparison_key(key):
return tuple(
int(part) if is_digit else part
for part, is_digit in zip(_re_digits.split(key), cycle((False, True)))
)
``````

Here’s another version of Mark Byers’s answer. This version demonstrates how to pass in an attribute name, that is to be used to evaluate the objects in the list.

``````def natural_sort(l, attrib):
convert = lambda text: int(text) if text.isdigit() else text.lower()
alphanum_key = lambda key: [convert(c) for c in re.split('([0-9]+)', key.__dict__[attrib])]
return sorted(l, key=alphanum_key)

results = natural_sort(albums, 'albumid')
``````

Where `albums` is a list of Album instances, and `albumid` is an string attribute that nominally has numbers in it.

A compact solution, based on the transformation of the string into a `List[Tuple(str, int)]`.

## Code

``````def string_to_pairs(s, pairs=re.compile(r"(D*)(d*)").findall):
return [(text.lower(), int(digits or 0)) for (text, digits) in pairs(s)[:-1]]
``````

## Demonstration

``````sorted(['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9'], key=string_to_pairs)
``````

Output:

``````['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
``````

## Tests

### Transformation

``````assert string_to_pairs("") == []
assert string_to_pairs("123") == [("", 123)]
assert string_to_pairs("abc") == [("abc", 0)]
assert string_to_pairs("123abc") == [("", 123), ("abc", 0)]
assert string_to_pairs("abc123") == [("abc", 123)]
assert string_to_pairs("123abc456") == [("", 123), ("abc", 456)]
assert string_to_pairs("abc123efg") == [("abc", 123), ("efg", 0)]
``````

### Sorting

``````# Some extracts from the test suite of the natsort library. Permalink:
# https://github.com/SethMMorton/natsort/blob/e3c32f5638bf3a0e9a23633495269bea0e75d379/tests/test_natsorted.py

sort_data = [
(  # same as test_natsorted_can_sort_as_unsigned_ints_which_is_default()
["a50", "a51.", "a50.31", "a-50", "a50.4", "a5.034e1", "a50.300"],
["a5.034e1", "a50", "a50.4", "a50.31", "a50.300", "a51.", "a-50"],
),
(  # same as test_natsorted_numbers_in_ascending_order()
["a2", "a5", "a9", "a1", "a4", "a10", "a6"],
["a1", "a2", "a4", "a5", "a6", "a9", "a10"],
),
(  # same as test_natsorted_can_sort_as_version_numbers()
["1.9.9a", "1.11", "1.9.9b", "1.11.4", "1.10.1"],
["1.9.9a", "1.9.9b", "1.10.1", "1.11", "1.11.4"],
),
(  # different from test_natsorted_handles_filesystem_paths()
[
"/p/Folder (10)/file.tar.gz",
"/p/Folder (1)/file (1).tar.gz",
"/p/Folder/file.x1.9.tar.gz",
"/p/Folder (1)/file.tar.gz",
"/p/Folder/file.x1.10.tar.gz",
],
[
"/p/Folder (1)/file (1).tar.gz",
"/p/Folder (1)/file.tar.gz",
"/p/Folder (10)/file.tar.gz",
"/p/Folder/file.x1.9.tar.gz",
"/p/Folder/file.x1.10.tar.gz",
],
),
(  # same as test_natsorted_path_extensions_heuristic()
[
"Try.Me.Bug - 09 - One.Two.Three.[text].mkv",
"Try.Me.Bug - 07 - One.Two.5.[text].mkv",
"Try.Me.Bug - 08 - One.Two.Three[text].mkv",
],
[
"Try.Me.Bug - 07 - One.Two.5.[text].mkv",
"Try.Me.Bug - 08 - One.Two.Three[text].mkv",
"Try.Me.Bug - 09 - One.Two.Three.[text].mkv",
],
),
(  # same as ns.IGNORECASE for test_natsorted_supports_case_handling()
["Apple", "corn", "Corn", "Banana", "apple", "banana"],
["Apple", "apple", "Banana", "banana", "corn", "Corn"],
),

]

for (given, expected) in sort_data:
assert sorted(given, key=string_to_pairs) == expected
``````

## Bonus

If your strings mix non-ascii texts and numbers, you may be interested in composing `string_to_pairs()` with the function `remove_diacritics()` I give elsewhere.

Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.