Sorting and Grouping Nested Lists in Python

Question:

I have the following data structure (a list of lists)

[
 ['4', '21', '1', '14', '2008-10-24 15:42:58'], 
 ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
 ['5', '21', '3', '19', '2008-10-24 15:45:45'], 
 ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
 ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
]

I would like to be able to

  1. Use a function to reorder the list so that I can group by each item in the list. For example I’d like to be able to group by the second column (so that all the 21’s are together)

  2. Use a function to only display certain values from each inner list. For example i’d like to reduce this list to only contain the 4th field value of ‘2somename’

so the list would look like this

[
     ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
     ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
]
Asked By: m3clov3n

||

Answers:

If I understand your question correctly, the following code should do the job:

l = [
 ['4', '21', '1', '14', '2008-10-24 15:42:58'], 
 ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
 ['5', '21', '3', '19', '2008-10-24 15:45:45'], 
 ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
 ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
]

def compareField(field):
   def c(l1,l2):
      return cmp(l1[field], l2[field])
   return c

# Use compareField(1) as the ordering criterion, i.e. sort only with
# respect to the 2nd field
l.sort(compareField(1))
for row in l: print row

print
# Select only those sublists for which 4th field=='2somename'
l2somename = [row for row in l if row[3]=='2somename']
for row in l2somename: print row

Output:

['4', '21', '1', '14', '2008-10-24 15:42:58']
['5', '21', '3', '19', '2008-10-24 15:45:45']
['6', '21', '1', '1somename', '2008-10-24 15:45:49']
['3', '22', '4', '2somename', '2008-10-24 15:22:03']
['7', '22', '3', '2somename', '2008-10-24 15:45:51']

['3', '22', '4', '2somename', '2008-10-24 15:22:03']
['7', '22', '3', '2somename', '2008-10-24 15:45:51']
Answered By: Federico A. Ramponi

If you assigned it to var “a”…

python 2.x:

#1:

a.sort(lambda x,y: cmp(x[1], y[1]))

#2:

filter(lambda x: x[3]=="2somename", a)

python 3:

#1:

a.sort(key=lambda x: x[1])
Answered By: Jimmy2Times

Use a function to reorder the list so that I can group by each item in the list. For example I’d like to be able to group by the second column (so that all the 21’s are together)

Lists have a built in sort method and you can provide a function that extracts the sort key.

>>> import pprint
>>> l.sort(key = lambda ll: ll[1])
>>> pprint.pprint(l)
[['4', '21', '1', '14', '2008-10-24 15:42:58'],
 ['5', '21', '3', '19', '2008-10-24 15:45:45'],
 ['6', '21', '1', '1somename', '2008-10-24 15:45:49'],
 ['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
 ['7', '22', '3', '2somename', '2008-10-24 15:45:51']]

Use a function to only display certain values from each inner list. For example i’d like to reduce this list to only contain the 4th field value of ‘2somename’

This looks like a job for list comprehensions

>>> [ll[3] for ll in l]
['14', '2somename', '19', '1somename', '2somename']
Answered By: Aaron Maenpaa

For the first question, the first thing you should do is sort the list by the second field using itemgetter from the operator module:

x = [
 ['4', '21', '1', '14', '2008-10-24 15:42:58'], 
 ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
 ['5', '21', '3', '19', '2008-10-24 15:45:45'], 
 ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
 ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
]

from operator import itemgetter

x.sort(key=itemgetter(1))

Then you can use itertools’ groupby function:

from itertools import groupby
y = groupby(x, itemgetter(1))

Now y is an iterator containing tuples of (element, item iterator). It’s more confusing to explain these tuples than it is to show code:

for elt, items in groupby(x, itemgetter(1)):
    print(elt, items)
    for i in items:
        print(i)

Which prints:

21 <itertools._grouper object at 0x511a0>
['4', '21', '1', '14', '2008-10-24 15:42:58']
['5', '21', '3', '19', '2008-10-24 15:45:45']
['6', '21', '1', '1somename', '2008-10-24 15:45:49']
22 <itertools._grouper object at 0x51170>
['3', '22', '4', '2somename', '2008-10-24 15:22:03']
['7', '22', '3', '2somename', '2008-10-24 15:45:51']

For the second part, you should use list comprehensions as mentioned already here:

from pprint import pprint as pp
pp([y for y in x if y[3] == '2somename'])

Which prints:

[['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
 ['7', '22', '3', '2somename', '2008-10-24 15:45:51']]
Answered By: llimllib

If you’ll be doing a lot of sorting and filtering, you may like some helper functions.

m = [
 ['4', '21', '1', '14', '2008-10-24 15:42:58'], 
 ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
 ['5', '21', '3', '19', '2008-10-24 15:45:45'], 
 ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
 ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
]

# Sort and filter helpers.
sort_on   = lambda pos:     lambda x: x[pos]
filter_on = lambda pos,val: lambda l: l[pos] == val

# Sort by second column
m = sorted(m, key=sort_on(1))

# Filter on 4th column, where value = '2somename'
m = filter(filter_on(3,'2somename'),m)
Answered By: Kenan Banks

It looks a lot like you’re trying to use a list as a database.

Nowadays Python includes sqlite bindings in the core distribution. If you don’t need persistence, it’s really easy to create an in-memory sqlite database (see How do I create a sqllite3 in-memory database?).

Then you can use SQL statements to do all this sorting and filtering without having to reinvent the wheel.

Answered By: Kamil Kisiel

For part (2), with x being your array, I think you want,

[y for y in x if y[3] == '2somename']

Which will return a list of just your data lists that have a fourth value being ‘2somename’… Although it seems Kamil is giving you the best advice with going for SQL…

Answered By: Adrian Bool

You’re simply creating indexes on your structure, right?

>>> from collections import defaultdict
>>> def indexOn( things, pos ):
...     inx= defaultdict(list)
...     for t in things:
...             inx[t[pos]].append(t)
...     return inx
... 
>>> a=[
...  ['4', '21', '1', '14', '2008-10-24 15:42:58'], 
...  ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
...  ['5', '21', '3', '19', '2008-10-24 15:45:45'], 
...  ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
...  ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
... ]

Here’s your first request, grouped by position 1.

>>> import pprint
>>> pprint.pprint( dict(indexOn(a,1)) )
{'21': [['4', '21', '1', '14', '2008-10-24 15:42:58'],
        ['5', '21', '3', '19', '2008-10-24 15:45:45'],
        ['6', '21', '1', '1somename', '2008-10-24 15:45:49']],
 '22': [['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
        ['7', '22', '3', '2somename', '2008-10-24 15:45:51']]}

Here’s your second request, grouped by position 3.

>>> dict(indexOn(a,3))
{'19': [['5', '21', '3', '19', '2008-10-24 15:45:45']], '14': [['4', '21', '1', '14', '2008-10-24 15:42:58']], '2somename': [['3', '22', '4', '2somename', '2008-10-24 15:22:03'], ['7', '22', '3', '2somename', '2008-10-24 15:45:51']], '1somename': [['6', '21', '1', '1somename', '2008-10-24 15:45:49']]}
>>> pprint.pprint(_)
{'14': [['4', '21', '1', '14', '2008-10-24 15:42:58']],
 '19': [['5', '21', '3', '19', '2008-10-24 15:45:45']],
 '1somename': [['6', '21', '1', '1somename', '2008-10-24 15:45:49']],
 '2somename': [['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
               ['7', '22', '3', '2somename', '2008-10-24 15:45:51']]} 
Answered By: S.Lott

You can use for loop to sort and group the elements in the nested list. The code will be:

l = [['3', '21', '1', '14', '2008-10-24 15:42:58'], 
['4', '22', '4','2somename','2008-10-24 15:22:03'], 
['5', '21', '3', '19', '2008-10-24 15:45:45'], 
['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
['7', '35', '3','2somename', '2008-10-24 15:45:51']]
col = int(input("Enter the column to search(1-5):"))
val = str(input("Enter the element to group by:"))
val1=[]
print('Searching...')
for x in l:
    cmp=x[col-1]
    if cmp==val:
        val1=x
        print(val1)
emp=[]
if val1 == emp:
    print('No search result. Please Try Again!!')

The output would look like this:

Enter the column to search(1-5):4
Enter the element to group by:2somename
Searching...
['4', '22', '4', '2somename', '2008-10-24 15:22:03']
['7', '35', '3', '2somename', '2008-10-24 15:45:51']
Answered By: Adarsh Shukla
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.