Python – Which data structure to use?

Question:

I have a large array of numerical data that I need to sort, insert, and move values backwards and forwards in the sorted order. I was previously using a simple array. Now each value has to be linked with an id (a unique int, just along for the ride).

Can I extend the array class, or do I need to use a list of tuples? What is my best option?

Asked By: user1012037

||

Answers:

why not using a dictionary, with the key as the item of original array, and value is the id related to the key.

of course you could access it in sorted order, like this:

a = {'key':'id'}

keys = a.keys()

keys.sort()

for k in keys:

    print a[key]
Answered By: Zhenyu Li

You can just use a list for the sake of having a sorted – well – list. If you want to associate additional data, you could either use a tuple to store the data, or even create a custom object for it that stores the id in an additional field.

You shouldn’t need to extend the list for that, you can just put any object into a list. For example this would be easily possible:

>>> lst = [ ( 132, 'foobar' ), ( 58, 'other value' ) ]
>>> lst.append( ( 70, 'some data value' ) )
>>> lst
[(132, 'foobar'), (58, 'other value'), (70, 'some data value')]
>>> lst.sort( key=lambda x: x[0] )
>>> lst
[(58, 'other value'), (70, 'some data value'), (132, 'foobar')]
>>> lst.sort( key=lambda x: x[1] )
>>> lst
[(132, 'foobar'), (58, 'other value'), (70, 'some data value')]

edit:

In case you are using Python 3.1+, you could also use the collections.OrderedDict type. It is an extension to the normal dict which maintains the order just like list does.

Answered By: poke

Similar to poke’s answer, you can use a 2d array — but if the arrays are big, NumPy is generally a good bet for any kind of numerical data in Python. Just make a 2D array that looks like

[ [1  614.124]
  [2  621236.139]
  [3  1243.612] ]

and then you can sort with .sort().

Answered By: Danica

Using lists or arrays is problematic when you need to do insertions or deletions — these are O(n) operations which can be devastatingly slow with large datasets.

Consider using blist which has a list-like API but affords O(lg N) insertion and deletion.

Answered By: Raymond Hettinger
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.