Sort dictionary by key numeric with alphanumeric data

Question:

I have a (Python) dictionary looking like this:

[
    {
        "data": "somedata1",
        "name": "prefix1.7.9"
    },
    {
        "data": "somedata2",
        "name": "prefix1.7.90"
    },
    {
        "data": "somedata3",
        "name": "prefix1.1.1"
    },
    {
        "data": "somedata4",
        "name": "prefix4.1.1"
    },
    {
        "data": "somedata5",
        "name": "prefix4.1.2"
    },
    {
        "data": "somedata5",
        "name": "other 123"
    },
    {
        "data": "somedata6",
        "name": "different"
    },  
    {
        "data": "somedata7",
        "name": "prefix1.7.11"
    },
    {
        "data": "somedata7",
        "name": "prefix1.11.9"
    },
    {
        "data": "somedata7",
        "name": "prefix1.17.9"
    }   
]

Now I want to sort it by "name" key.
If there postfix are numbers (splitted by 2 points) I want to sort it numerical.
e.g. with a resulting order:

different
other 123
prefix1.1.1
prefix1.1.9
prefix1.7.11
prefix1.7.90
prefix1.11.9
prefix1.17.9
prefix4.1.1
prefix4.1.2

Do you have an idea how to do this short and efficient?
The only idear I had, was to build a complete new list, but possibly this could also be done using a lambda function?

Asked By: kruemel4

||

Answers:

You need to come up with a way of extracting your prefix, and your postfix from the ‘name’ values. This can be achieved using something like:

import math


def extract_prefix(s: str) -> str:
    return s.split('.')[0]


def extract_postfix(s: str) -> float:
    try:
        return float('.'.join(s.split('.')[1:]))
    except ValueError:
        # if we cannot form a float i.e. no postfix exists, it'll be before some value with same prefix
        return -math.inf


arr = [{'data': 'somedata1', 'name': 'prefix1.7.9'},
 {'data': 'somedata2', 'name': 'prefix1.7.90'},
 {'data': 'somedata3', 'name': 'prefix1.1.1'},
 {'data': 'somedata4', 'name': 'prefix4.1.1'},
 {'data': 'somedata5', 'name': 'prefix4.1.2'},
 {'data': 'somedata5', 'name': 'other 123'},
 {'data': 'somedata6', 'name': 'different'},
 {'data': 'somedata7', 'name': 'prefix1.7.11'},
 {'data': 'somedata7', 'name': 'prefix1.11.9'},
 {'data': 'somedata7', 'name': 'prefix1.17.9'}]


result = sorted(sorted(arr, key=lambda d: extract_postfix(d['name'])), key=lambda d: extract_prefix(d['name']))

result:

[{'data': 'somedata6', 'name': 'different'},
 {'data': 'somedata5', 'name': 'other 123'},
 {'data': 'somedata3', 'name': 'prefix1.1.1'},
 {'data': 'somedata7', 'name': 'prefix1.7.11'},
 {'data': 'somedata1', 'name': 'prefix1.7.9'},
 {'data': 'somedata2', 'name': 'prefix1.7.90'},
 {'data': 'somedata7', 'name': 'prefix1.11.9'},
 {'data': 'somedata7', 'name': 'prefix1.17.9'},
 {'data': 'somedata4', 'name': 'prefix4.1.1'},
 {'data': 'somedata5', 'name': 'prefix4.1.2'}]
Answered By: Abirbhav G.

Since you want to sort numerically you will need a helper function:

def split_name(s):
    nameparts = s.split('.')
    for i,p in enumerate(nameparts):
        if p.isdigit():
            nameparts[i] = int(p)
    return nameparts

obj = obj.sort(key = lambda x:split_name(x['name']))
Answered By: gimix

You can use re.findall with a regex that extracts either non-numerical words or digits from each name, and convert those that are digits to integers for numeric comparisons. To avoid comparisons between strings and integers, make the key a tuple where the first item is a Boolean of whether the token is numeric and the second item is the actual key for comparison:

import re

# initialize your input list as the lst variable
lst.sort(
    key=lambda d: [
        (s.isdigit(), int(s) if s.isdigit() else s)
        for s in re.findall(r'[^Wd]+|d+', d['name'])
    ]
)

Demo: https://replit.com/@blhsing/ToughWholeInformationtechnology

Answered By: blhsing

Here I am first sorting the list by version. Storing in the another list rank call rank, this list helps to replicates the ranking position for custom sorting.

Code using the pkg_resources:

from pkg_resources import parse_version

rank=sorted([v['name'] for v in Mydata], key=parse_version)

or

rank = sorted(sorted([v['name'] for v in Mydata], key=parse_version), key = lambda s: s[:3]=='pre') #To avoid the prefix value in sorting
sorted(Mydata, key = lambda x: rank.index(x['name']))

Output:

[{'data': 'somedata6', 'name': 'different'},
 {'data': 'somedata5', 'name': 'other 123'},
 {'data': 'somedata3', 'name': 'prefix1.1.1'},
 {'data': 'somedata1', 'name': 'prefix1.7.9'},
 {'data': 'somedata7', 'name': 'prefix1.7.11'},
 {'data': 'somedata2', 'name': 'prefix1.7.90'},
 {'data': 'somedata7', 'name': 'prefix1.11.9'},
 {'data': 'somedata7', 'name': 'prefix1.17.9'},
 {'data': 'somedata4', 'name': 'prefix4.1.1'},
 {'data': 'somedata5', 'name': 'prefix4.1.2'}]

With another inputs:

[{'data': 'somedata6', 'name': 'Aop'},
 {'data': 'somedata6', 'name': 'different'},
 {'data': 'somedata5', 'name': 'other 123'},
 {'data': 'somedata7', 'name': 'pop'},
 {'data': 'somedata3', 'name': 'prefix1.hello'},
 {'data': 'somedata3', 'name': 'prefix1.1.1'},
 {'data': 'somedata4', 'name': 'prefix1.2.hello'},
 {'data': 'somedata1', 'name': 'prefix1.7.9'},
 {'data': 'somedata7', 'name': 'prefix1.7.11'},
 {'data': 'somedata2', 'name': 'prefix1.7.90'},
 {'data': 'somedata7', 'name': 'prefix1.17.9'},
 {'data': 'somedata7', 'name': 'prefix1.17.9'},
 {'data': 'somedata5', 'name': 'prefix4.1.2'},
 {'data': 'somedata7', 'name': 'prefix9.1.1'},
 {'data': 'somedata7', 'name': 'prefix10.11.9'}] 
Answered By: R. Baraiya