case-insensitive list sorting, without lowercasing the result?
Question:
I have a list of strings like this:
['Aden', 'abel']
I want to sort the items, case-insensitive.
So I want to get:
['abel', 'Aden']
But I get the opposite with sorted()
or list.sort()
, because uppercase appears before lowercase.
How can I ignore the case? I’ve seen solutions which involves lowercasing all list items, but I don’t want to change the case of the list items.
Answers:
>>> x = ['Aden', 'abel']
>>> sorted(x, key=str.lower) # Or unicode.lower if all items are unicode
['abel', 'Aden']
In Python 3 str
is unicode but in Python 2 you can use this more general approach which works for both str
and unicode
:
>>> sorted(x, key=lambda s: s.lower())
['abel', 'Aden']
In Python 3.3+ there is the str.casefold
method that’s specifically designed for caseless matching:
sorted_list = sorted(unsorted_list, key=str.casefold)
In Python 2 use lower()
:
sorted_list = sorted(unsorted_list, key=lambda s: s.lower())
It works for both normal and unicode strings, since they both have a lower
method.
In Python 2 it works for a mix of normal and unicode strings, since values of the two types can be compared with each other. Python 3 doesn’t work like that, though: you can’t compare a byte string and a unicode string, so in Python 3 you should do the sane thing and only sort lists of one type of string.
>>> lst = ['Aden', u'abe1']
>>> sorted(lst)
['Aden', u'abe1']
>>> sorted(lst, key=lambda s: s.lower())
[u'abe1', 'Aden']
You can also try this to sort the list in-place:
>>> x = ['Aden', 'abel']
>>> x.sort(key=lambda y: y.lower())
>>> x
['abel', 'Aden']
Try this
def cSort(inlist, minisort=True):
sortlist = []
newlist = []
sortdict = {}
for entry in inlist:
try:
lentry = entry.lower()
except AttributeError:
sortlist.append(lentry)
else:
try:
sortdict[lentry].append(entry)
except KeyError:
sortdict[lentry] = [entry]
sortlist.append(lentry)
sortlist.sort()
for entry in sortlist:
try:
thislist = sortdict[entry]
if minisort: thislist.sort()
newlist = newlist + thislist
except KeyError:
newlist.append(entry)
return newlist
lst = ['Aden', 'abel']
print cSort(lst)
Output
['abel', 'Aden']
I did it this way for Python 3.3:
def sortCaseIns(lst):
lst2 = [[x for x in range(0, 2)] for y in range(0, len(lst))]
for i in range(0, len(lst)):
lst2[i][0] = lst[i].lower()
lst2[i][1] = lst[i]
lst2.sort()
for i in range(0, len(lst)):
lst[i] = lst2[i][1]
Then you just can call this function:
sortCaseIns(yourListToSort)
In python3 you can use
list1.sort(key=lambda x: x.lower()) #Case In-sensitive
list1.sort() #Case Sensitive
This works in Python 3 and does not involves lowercasing the result (!).
values.sort(key=str.lower)
Case-insensitive sort, sorting the string in place, in Python 2 OR 3 (tested in Python 2.7.17 and Python 3.6.9):
>>> x = ["aa", "A", "bb", "B", "cc", "C"]
>>> x.sort()
>>> x
['A', 'B', 'C', 'aa', 'bb', 'cc']
>>> x.sort(key=str.lower) # <===== there it is!
>>> x
['A', 'aa', 'B', 'bb', 'C', 'cc']
The key is key=str.lower
. Here’s what those commands look like with just the commands, for easy copy-pasting so you can test them:
x = ["aa", "A", "bb", "B", "cc", "C"]
x.sort()
x
x.sort(key=str.lower)
x
Note that if your strings are unicode strings, however (like u'some string'
), then in Python 2 only (NOT in Python 3 in this case) the above x.sort(key=str.lower)
command will fail and output the following error:
TypeError: descriptor 'lower' requires a 'str' object but received a 'unicode'
If you get this error, then either upgrade to Python 3 where they handle unicode sorting, or convert your unicode strings to ASCII strings first, using a list comprehension, like this:
# for Python2, ensure all elements are ASCII (NOT unicode) strings first
x = [str(element) for element in x]
# for Python2, this sort will only work on ASCII (NOT unicode) strings
x.sort(key=str.lower)
References:
Python3:
Sorting is discussed in other answers but here is what is going on behind the scenes with the sort options.
Say we would like to sort the following list case-insensitive we can use ‘key=’:
strs = ['aa', 'BB', 'zz', 'CC']
strs_sorted = sorted(strs,key=str.lower)
print(strs_sorted)
['aa', 'BB', 'CC', 'zz']
What is happening here ?
The key is telling the sort to use ‘proxy values. ‘Key=’ transforms each element before comparison. The key function takes in 1 value and returns 1 value, and the returned "proxy" value is used for the comparisons within the sort.
Hence we are employing ‘.lower’ to make all of our proxy values all lowercase which eliminates the case differences and returns the list in order by lowercase essentially.
str.lower vs str.casefold
As mentioned in other posts you can also use "casefold()" as the key or anything (for example "len" to sort by char length). The casefold() method is an aggressive lower() method which converts strings to case folded strings for caseless matching.
sorted(strs,key=str.casefold)
What about creating my own sort function?
Generally speaking, it is always best to use the built-in functions for sorting unless there is an extreme need not to. The build-in functions have been unit tested and will most likely the most reliable.
Python2:
Similar principle,
sorted_list = sorted(strs, key=lambda s: s.lower())
I have a list of strings like this:
['Aden', 'abel']
I want to sort the items, case-insensitive.
So I want to get:
['abel', 'Aden']
But I get the opposite with sorted()
or list.sort()
, because uppercase appears before lowercase.
How can I ignore the case? I’ve seen solutions which involves lowercasing all list items, but I don’t want to change the case of the list items.
>>> x = ['Aden', 'abel']
>>> sorted(x, key=str.lower) # Or unicode.lower if all items are unicode
['abel', 'Aden']
In Python 3 str
is unicode but in Python 2 you can use this more general approach which works for both str
and unicode
:
>>> sorted(x, key=lambda s: s.lower())
['abel', 'Aden']
In Python 3.3+ there is the str.casefold
method that’s specifically designed for caseless matching:
sorted_list = sorted(unsorted_list, key=str.casefold)
In Python 2 use lower()
:
sorted_list = sorted(unsorted_list, key=lambda s: s.lower())
It works for both normal and unicode strings, since they both have a lower
method.
In Python 2 it works for a mix of normal and unicode strings, since values of the two types can be compared with each other. Python 3 doesn’t work like that, though: you can’t compare a byte string and a unicode string, so in Python 3 you should do the sane thing and only sort lists of one type of string.
>>> lst = ['Aden', u'abe1']
>>> sorted(lst)
['Aden', u'abe1']
>>> sorted(lst, key=lambda s: s.lower())
[u'abe1', 'Aden']
You can also try this to sort the list in-place:
>>> x = ['Aden', 'abel']
>>> x.sort(key=lambda y: y.lower())
>>> x
['abel', 'Aden']
Try this
def cSort(inlist, minisort=True):
sortlist = []
newlist = []
sortdict = {}
for entry in inlist:
try:
lentry = entry.lower()
except AttributeError:
sortlist.append(lentry)
else:
try:
sortdict[lentry].append(entry)
except KeyError:
sortdict[lentry] = [entry]
sortlist.append(lentry)
sortlist.sort()
for entry in sortlist:
try:
thislist = sortdict[entry]
if minisort: thislist.sort()
newlist = newlist + thislist
except KeyError:
newlist.append(entry)
return newlist
lst = ['Aden', 'abel']
print cSort(lst)
Output
['abel', 'Aden']
I did it this way for Python 3.3:
def sortCaseIns(lst):
lst2 = [[x for x in range(0, 2)] for y in range(0, len(lst))]
for i in range(0, len(lst)):
lst2[i][0] = lst[i].lower()
lst2[i][1] = lst[i]
lst2.sort()
for i in range(0, len(lst)):
lst[i] = lst2[i][1]
Then you just can call this function:
sortCaseIns(yourListToSort)
In python3 you can use
list1.sort(key=lambda x: x.lower()) #Case In-sensitive
list1.sort() #Case Sensitive
This works in Python 3 and does not involves lowercasing the result (!).
values.sort(key=str.lower)
Case-insensitive sort, sorting the string in place, in Python 2 OR 3 (tested in Python 2.7.17 and Python 3.6.9):
>>> x = ["aa", "A", "bb", "B", "cc", "C"]
>>> x.sort()
>>> x
['A', 'B', 'C', 'aa', 'bb', 'cc']
>>> x.sort(key=str.lower) # <===== there it is!
>>> x
['A', 'aa', 'B', 'bb', 'C', 'cc']
The key is key=str.lower
. Here’s what those commands look like with just the commands, for easy copy-pasting so you can test them:
x = ["aa", "A", "bb", "B", "cc", "C"]
x.sort()
x
x.sort(key=str.lower)
x
Note that if your strings are unicode strings, however (like u'some string'
), then in Python 2 only (NOT in Python 3 in this case) the above x.sort(key=str.lower)
command will fail and output the following error:
TypeError: descriptor 'lower' requires a 'str' object but received a 'unicode'
If you get this error, then either upgrade to Python 3 where they handle unicode sorting, or convert your unicode strings to ASCII strings first, using a list comprehension, like this:
# for Python2, ensure all elements are ASCII (NOT unicode) strings first
x = [str(element) for element in x]
# for Python2, this sort will only work on ASCII (NOT unicode) strings
x.sort(key=str.lower)
References:
Python3:
Sorting is discussed in other answers but here is what is going on behind the scenes with the sort options.
Say we would like to sort the following list case-insensitive we can use ‘key=’:
strs = ['aa', 'BB', 'zz', 'CC']
strs_sorted = sorted(strs,key=str.lower)
print(strs_sorted)
['aa', 'BB', 'CC', 'zz']
What is happening here ?
The key is telling the sort to use ‘proxy values. ‘Key=’ transforms each element before comparison. The key function takes in 1 value and returns 1 value, and the returned "proxy" value is used for the comparisons within the sort.
Hence we are employing ‘.lower’ to make all of our proxy values all lowercase which eliminates the case differences and returns the list in order by lowercase essentially.
str.lower vs str.casefold
As mentioned in other posts you can also use "casefold()" as the key or anything (for example "len" to sort by char length). The casefold() method is an aggressive lower() method which converts strings to case folded strings for caseless matching.
sorted(strs,key=str.casefold)
What about creating my own sort function?
Generally speaking, it is always best to use the built-in functions for sorting unless there is an extreme need not to. The build-in functions have been unit tested and will most likely the most reliable.
Python2:
Similar principle,
sorted_list = sorted(strs, key=lambda s: s.lower())