Java's TreeSet equivalent in Python?

Question:

I recently came across some Java code that simply put some strings into a Java TreeSet, implemented a distance based comparator for it, and then made its merry way into the sunset to compute a given score to solve the given problem.

My questions,

• Is there an equivalent data structure available for Python?

• The Java treeset looks basically to be an ordered dictionary that can use a comparator of some sort to achieve this ordering.
• I see there’s a PEP for Py3K for an OrderedDict, but I’m using 2.6.x. There are a bunch of ordered dict implementations out there – anyone in particular that can be recommended?

PS, Just to add – I could probably import DictMixin or UserDict and implement my own sorted/ordered dictionary, AND make it happen through a comparator function – but that seems to be overkill.

Thanks.

Update. Thanks for the answers. To elaborate a bit, lets say I’ve got a compare function thats defined like, (given a particular value ln),

``````def mycmp(x1, y1, ln):
a = abs(x1-ln)
b = abs(y1-ln)
if a<b:
return -1
elif a>b:
return 1
else:
return 0
``````

I’m a bit unsure about how I’d integrate this into the ordering given in the ordered dict link given here...

Something like,

``````OrderedDict(sorted(d.items(), cmp=mycmp(len)))
``````

Ideas would be welcome.

Answers:

1.
I don’t think python has a built-in Sorted sets.
How about something like this?

``````letters = ['w', 'Z', 'Q', 'B', 'C', 'A']
for l in sorted(set(letters)):
print l
``````

2.Java `TreeSet` is an implementation of the abstraction called `SortedSet`. Basic types will be sorted on natural order.A `TreeSet` instance performs all key comparisons using its compareTo (or compare) method.So your custom keys should implement proper `compareTo`

The Python 2.7 docs for `collections.OrderedDict` has a link to a OrderedDict recipe that runs on Python 2.4 or better.

Edit: In regard to sorting: Use `key=` rather than `cmp=`. It tends to lead to faster code and moreover, the `cmp=` keyword has been eliminated in Python3.

``````d={5:6,7:8,100:101,1:2,3:4}
print(d.items())
# [(1, 2), (3, 4), (100, 101), (5, 6), (7, 8)]
``````

The code you posted for `mycmp` doesn’t make it clear what you want passed as `x1`. Below, I assume x1 is supposed to be the value in each key-value pair. If so, you could do something like this:

``````length=4
print(sorted(d.items(),key=lambda item: abs(item[1]-length) ))
# [(3, 4), (1, 2), (5, 6), (7, 8), (100, 101)]
``````

`key=...` is passed a function, `lambda item: abs(item[1]-length)`.
For each `item` in `d.items()`, the lambda function returns the number `abs(item[1]-length)`. This number acts as proxy for the item as far as sorting is concerned. See this essay for more information on sorting idioms in Python.

PS. `len` is a Python builtin function. So as to not clobber that `len`, I’ve changed the variable name to `length`.

If what you want is a set that always iterates in sorted-order, this might get you most of the way there:

``````def invalidate_sorted(f):
def wrapper(self, *args, **kwargs):
self._sort_cache = None
return f(self, *args, **kwargs)
return wrapper

class SortedSet(set):
_sort_cache = None

_invalidate_sort_methods = """
add clear difference_update discard intersection_update
symmetric_difference_update pop remove update
__iand__ __ior__ __isub__ __ixor__
""".split()

def __iter__(self):
if not self._sort_cache:
self._sort_cache = sorted(set.__iter__(self))
for item in self._sort_cache:
yield item

def __repr__(self):
return '%s(%r)' % (type(self).__name__, list(self))

for methodname in _invalidate_sort_methods:
locals()[methodname] = invalidate_sorted(getattr(set, methodname))
``````

I’d need to see some example data, but if you’re just trying to do a weighted sort, then the builtin python sorted() can do it, two ways.

With well ordered tuples and a key() function:

``````def cost_per_page(book):
title, pagecount, cost = book
return float(cost)/pagecount

booklist = [
("Grey's Anatomy", 3000, 200),
('The Hobbit', 300, 7.25),
('Moby Dick', 4000, 4.75),
]
for book in sorted(booklist, key=cost_per_page):
print book
``````

or with a class with a `__cmp__` operator.

``````class Book(object):
def __init__(self, title, pagecount, cost):
self.title = title
self.pagecount = pagecount
self.cost = cost
def pagecost(self):
return float(self.cost)/self.pagecount
def __cmp__(self, other):
'only comparable with other books'
return cmp(self.pagecost(), other.pagecost())
def __str__(self):
return str((self.title, self.pagecount, self.cost))

booklist = [
Book("Grey's Anatomy", 3000, 200),
Book('The Hobbit', 300, 7.25),
Book('Moby Dick', 4000, 4.75),
]
for book in sorted(booklist):
print book
``````

Both of these return the same output:

``````('Moby Dick', 4000, 4.75)
('The Hobbit', 300, 7.25)
("Grey's Anatomy", 3000, 200)
``````

I recently implemented TreeSet for Python using bisect module.

https://github.com/fukatani/TreeSet

Its usage is similar to Java’s Treeset.

ex.

``````from treeset import TreeSet
ts = TreeSet([3,7,2,7,1,3])
print(ts)
>>> [1, 2, 3, 7]

ts.add(4)
print(ts)
>>> [1, 2, 3, 4, 7]

ts.remove(7)
print(ts)
>>> [1, 2, 3, 4]

print(ts[2])
>>> 3
``````

When you are coming with java treeset:

`````` import java.util.*;
class Main{
public static void main(String args[])
{
TreeSet<Integer> tr=new TreeSet<>();
tr.add(3);
tr.add(5);
tr.add(7);
tr.add(6);
tr.add(3);
tr.add(8);

Iterator itr=tr.iterator();
for(int i=0;i<tr.size();i++)
{
System.out.print(tr.get(i)+" ");
}
}
}

>>>> **3 5 6 7 8**

same AS in python:
from treeset import TreeSet
tr = TreeSet([1,2,2,7,4,3])
print(tr)
>>> [1, 2, 3, 4,7]
``````