How to split and sort content of a list in python

Question:

I have the following list:

list1 = ['# Heading', '200: Stop Engine', '', '20: Start Engine', '400: Do xy']

and I want to get:

list2 = ['20: Start Engine', '200: Stop Engine', '400: Do xy']

So the empty list item and the ones starting with # should be deleted or ignored and the rest should be sorted by the number. I tried to use split() to extract the numbers and the #:

list2 = [i.split() for i in list1]

but then I get a list in a list brings some other problems (I need to convert the content of the list to an int for the sorting which only works if I have a string). The output would be:

list2 = ['#', 'Heading', '200:', 'Stop', 'Engine', '', '20:', 'Start', 'Engine', '400:', 'Do', 'xy']

and if I split(':'), I can’t delete the #.
For the sorting I tried:

list2.sort(key = lambda x: x[0])

to sort the items by the number. This only works if I can delete the # and the empty item and convert the string to a int. I hope someone can help me! Thanks in advance!

Asked By: Markus

||

Answers:

First you can filter out unwanted items from the list using list comprehension and then sort it:

list1 = ["# Heading", "200: Stop Engine", "", "20: Start Engine", "400: Do xy"]

out = sorted(
    [s for s in list1 if s.split(":")[0].isdigit()],
    key=lambda s: int(s.split(":")[0]),
)
print(out)

Prints:

['20: Start Engine', '200: Stop Engine', '400: Do xy']
Answered By: Andrej Kesely

Just do all the things you say:

Ignore all the items which don’t start with a number, then sort by the number before the colon delimiter:

def FilterAndSort(items):
    items = [item for item in items if item and item[0].isdigit()]
    return sorted(items, key=lambda item:int(item.split(':')[0]))


print(FilterAndSort(list1))

Output as requested

Answered By: quamrana

You can use list comprehension and the .sort() method for this :

list1 = ['# Heading', '200: Stop Engine', ' ', '20: Start Engine', '400: Do xy']

out = [el for el in list1 if (el[0].isdigit())]

out.sort(key=lambda el: int(el.split(':')[0]))

print(out)

output:

['20: Start Engine', '200: Stop Engine', '400: Do xy']
Answered By: mrCopiCat

Remove unwanted items then use the key operation on the remaining items using the sorted() function.

list1 = ['# Heading', '200: Stop Engine', '', '20: Start Engine', '400: Do xy']

list1 = sorted([l for l in list1 if '#' not in l and l], key = lambda x: int(x.split(':')[0]))
print(list1)
Answered By: eatmeimadanish

You’re on the right track — you first need to filter out the unwanted elements:

new_list1 = [el for el in list1 if el != '' and el[0].isdigit()]

And then sort the list using the integers prior to ":" as a key:

new_list1.sort(key = lambda x: int(x.split(":")[0]))
new_list1

['20: Start Engine', '200: Stop Engine', '400: Do xy']
Answered By: Dina Jankovic

Another approach would be using startswith like so:

>>> list1 = [i for i in list1 if i and not i.startswith('#')]
>>> list1.sort(key = lambda x: x.split(':')[0])
>>> list1
['20: Start Engine', '200: Stop Engine', '400: Do xy']
Answered By: game0ver

Let’s take advantage of the great existing python modules:

import re
from natsort import natsorted # pip install natsort

natsorted(s for s in list1 if re.match(r'd+:', s))

Output:

['20: Start Engine', '200: Stop Engine', '400: Do xy']

Used input:

list1 = ['# Heading', '200: Stop Engine', '', '20: Start Engine', '400: Do xy']

alternative with only re:

import re

sorted((s for s in list1 if (m:=re.match(r'(d+):', s))),
       key=lambda s: m.group(1))

NB. this alternative might be a bit hacky as m is not passed directly to key but relies on the order of execution of the operations.

Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.