Haskell equivalent of data constructors in Python?
Question:
In Haskell, I can define a binary tree as follows:
data Bint a = Leaf a | Branch a (Bint a) (Bint a)
then I can some operations on it as follows:
height (Leaf a) = 1
height (Branch a l r) = 1 + (max (height l) (height r))
count (Leaf a) = 1
count (Branch a l r) = 1 + (count l) + (count r)
I know Python doesn’t has equivalent of data
in Haskell. If it has, please tell.
So, how does one define a binary tree in Python and how to implement the above two functions in it?
Answers:
Python does not have the concept of “data constructor” as Haskell have. You can create classes, like most other OOP languages. Alternatively you can always represent your data type with built-ins and only define the functions to handle them(this is the approach used to implement heaps in the heapq
built-in module).
The differences between python and haskell are huge, so it’s better to avoid making tight comparisons between syntax/features of haskell and python, otherwise you’ll end up writing non-pythonic and inefficient python code.
Even if python does have some functional features, it is not a functional language, so you have to completely change the paradigm of your programs to obtain readable, pythonic and efficient programs.
A possible implementation using classes could be:
class Bint(object):
def __init__(self, value, left=None, right=None):
self.a = value
self.left = left
self.right = right
def height(self):
left_height = self.left.height() if self.left else 0
right_height = self.right.height() if self.right else 0
return 1 + max(left_height, right_height)
def count(self):
left_count = self.left.count() if self.left else 0
right_height = self.right.count() if self.right else 0
return 1 + left_count + right_count
The code may be simplified a bit providing a “smarter” default value for left
and right
:
class Nil(object):
def height(self):
return 0
count = height
nil = Nil()
class Bint(object):
def __init__(self, value, left=nil, right=nil):
self.value = value
self.left = left
self.right = right
def height(self):
return 1 + max(self.left.height(), self.right.height())
def count(self):
return 1 + self.left.count() + self.right.count()
Note that these implementations allow nodes with only one children.
However you do not have to use classes to define a data type.
For example you can say that a Bint
can be a list of a single element, the value of the root, or a list with three elements: the value, the left child and the right child.
In this case you can define the functions as:
def height(bint):
if len(bint) == 1:
return 1
return 1 + max(height(bint[1]), height(bint[2]))
def count(bint):
if len(bint) == 1:
return 1 + count(bint[1]) + count(bint[2])
Yet another approach would be to use namedtuple
s:
from collections import namedtuple
Leaf = namedtuple('Leaf', 'value')
Branch = namedtuple('Branch', 'value left right')
def height(bint):
if len(bint) == 1: # or isinstance(bint, Leaf)
return 1
return 1 + max(height(bint.left), height(bint.right))
def count(bint):
if len(bint) == 1: # or isinstance(bint, Leaf)
return 1 + count(bint.left) + count(bint.right)
The closest thing would probably be classes with methods:
class Leaf:
def __init__(self, value):
self.value = value
def height(self):
return 1
def count(self):
return 1
class Branch:
def __init__(self, left, right):
self.left = left
self.right = right
def height(self):
return 1 + max(self.left.height(), self.right.height())
def count(self):
return 1 + self.left.count() + self.right.count()
While this is somewhat idiomatic and has its own share of upsides, it also lacks some qualities of the Haskell version. Most importantly, the function definitions have to be defined along with the class, in the same module. You can use single-dispatch generic functions instead of methods to get that back. The result is more open-world-y than both methods and Haskell functions, and allows spreading the definition across multiple modules when beneficial.
@singledispatch
def height(_):
raise NotImplementedError()
@singledispatch
def count(_):
raise NotImplementedError()
class Leaf:
def __init__(self, value):
self.value = value
@height.register(Leaf)
def height_leaf(leaf):
return 1
@height.register(Leaf)
def count_leaf(leaf):
return 1
class Branch:
def __init__(self, left, right):
self.left = left
self.right = right
@height.register(Branch)
def height_branch(b):
return 1 + max(b.left.height(), b.right.height())
@count.register(Branch)
def count_branch(b):
return 1 + b.left.count() + b.right.count()
I am going for a close analogue to Haskell an functional programming here. This is not very “pythonic” in a sense. Especially, it’s not object oriented. It’s still useful and clean, though.
A datatype is a class. A datatype with multiple data constructors is a class with extra information about how it is constructed. And of course, it needs some data. Use the constructor to assure that it all trees are legal:
class BinTree (object):
def __init__(self, value=None, left=None, right=None):
if left == None and right == None and value != None:
self.isLeaf = True
self.value = value
elif left != None and right != None and value == None:
self.isLeaf = False
self.value = (left, right)
else:
raise ArgumentError("some help message")
This constructor is a bit inconvenient to call, so have some smart constructors that are easy to use:
def leaf(value):
return BinTree(value=value)
def branch(left, right):
return BinTree(left=left, right=right)
How do we get the values out? Let’s make some helpers for that, too:
def left(tree):
if tree.isLeaf:
raise ArgumentError ("tree is leaf")
else:
return tree.value[0]
def right(tree):
if tree.isLeaf:
raise ArgumentError ("tree is leaf")
else:
return tree.value[1]
def value(tree):
if not tree.isLeaf:
raise ArgumentError ("tree is branch")
else:
return tree.value
That’s it. You got a pure “algebraic” data type which can be accessed with functions:
def count(bin_tree):
if bin_tree.isLeaf:
return 1
else:
return count(left(bin_tree))+count(right(bin_tree))
Five years since last update, but here is my answer.
to make data Tree a = Leaf a | Branch (Tree a) (Tree a) deriving (Show)
to python…
Without type check (more like python)
class Tree:
def __init__(self):
pass
def height(self):
pass
def count(self):
pass
class Leaf(Tree):
def __init__(self, value):
self.value = value
def __str__(self):
return("Leaf " + str(self.value))
def height(self):
return 1
def count(self):
return 1
class Branch(Tree):
def __init__(self, left, right):
if isinstance(left, Tree) and isinstance(right, Tree):
self.left = left
self.right = right
else:
raise ValueError
def __str__(self):
return("Branch (" + str(self.left) + " " +
str(self.right) + ")")
def height(self):
return 1 + max(self.left.height(), self.right.height())
def count(self):
return 1 + self.left.count() + self.right.count()
#usage : Branch(Leaf(5), Branch(Leaf(3), Leaf(2)))
With type check (more like haskell)
height
, count
method at Tree class
class Tree:
def __init__(self, tree, type_):
def typecheck(subtree):
if isinstance(subtree, Leaf):
if not isinstance(subtree.value, type_):
print(subtree.value)
raise ValueError
elif isinstance(subtree, Branch):
typecheck(subtree.left)
typecheck(subtree.right)
else:
raise ValueError
typecheck(tree)
self.tree = tree
self.type_ = type_
def __str__(self):
return ("Tree " + self.type_.__name__ + "n" + str(self.tree))
def height(self):
if isinstance(self, Leaf):
return 1
elif isinstance(self, Branch):
return 1 + max(self.left.height(), self.right.height())
else:
return self.tree.height()
def count(self):
if isinstance(self, Leaf):
return 1
elif isinstance(self, Branch):
return 1 + self.left.count() + self.right.count()
else:
return self.tree.count()
class Leaf(Tree):
def __init__(self, value):
self.value = value
def __str__(self):
return("Leaf " + str(self.value))
class Branch(Tree):
def __init__(self, left, right):
if isinstance(left, Tree) and isinstance(right, Tree):
self.left = left
self.right = right
else:
raise ValueError
def __str__(self):
return("Branch (" + str(self.left) + " " +
str(self.right) + ")")
#usage tree1 = Tree(Branch(Leaf(5), Branch(Leaf(3), Leaf(2))), int)
#usage tree1.height() -> 3
#usage tree1.count() -> 5
height
, count
method at Leaf and Branch class
class Tree:
def __init__(self, tree, type_):
def typecheck(subtree):
if isinstance(subtree, Leaf):
if not isinstance(subtree.value, type_):
print(subtree.value)
raise ValueError
elif isinstance(subtree, Branch):
typecheck(subtree.left)
typecheck(subtree.right)
else:
raise ValueError
typecheck(tree)
self.tree = tree
self.type_ = type_
def __str__(self):
return ("Tree " + self.type_.__name__ + "n" + str(self.tree))
def height(self):
return self.tree.height()
def count(self):
return self.tree.count()
class Leaf(Tree):
def __init__(self, value):
self.value = value
def __str__(self):
return("Leaf " + str(self.value))
def height(self):
return 1
def count(self):
return 1
class Branch(Tree):
def __init__(self, left, right):
if isinstance(left, Tree) and isinstance(right, Tree):
self.left = left
self.right = right
else:
raise ValueError
def __str__(self):
return("Branch (" + str(self.left) + " " +
str(self.right) + ")")
def height(self):
return 1 + max(self.left.height(), self.right.height())
def count(self):
return 1 + self.left.count() + self.right.count()
#usage Tree(Branch(Leaf(5), Branch(Leaf(3), Leaf(2))), int)
#usage tree1.height() -> 3
#usage tree1.count() -> 5
height
, count
method outside classes (most like haskell)
class Tree:
def __init__(self, tree, type_):
def typecheck(subtree):
if isinstance(subtree, Leaf):
if not isinstance(subtree.value, type_):
print(subtree.value)
raise ValueError
elif isinstance(subtree, Branch):
typecheck(subtree.left)
typecheck(subtree.right)
else:
raise ValueError
typecheck(tree)
self.tree = tree
self.type_ = type_
def __str__(self):
return ("Tree " + self.type_.__name__ + "n" + str(self.tree))
class Leaf(Tree):
def __init__(self, value):
self.value = value
def __str__(self):
return("Leaf " + str(self.value))
class Branch(Tree):
def __init__(self, left, right):
if isinstance(left, Tree) and isinstance(right, Tree):
self.left = left
self.right = right
else:
raise ValueError
def __str__(self):
return("Branch (" + str(self.left) + " " +
str(self.right) + ")")
def height(tree):
if not isinstance(tree, Tree):
raise ValueError
if isinstance(tree, Leaf):
return 1
elif isinstance(tree, Branch):
return 1 + max(height(tree.left), height(tree.right))
else:
return height(tree.tree)
def count(tree):
if not isinstance(tree, Tree):
raise ValueError
if isinstance(tree, Leaf):
return 1
elif isinstance(tree, Branch):
return 1 + count(tree.left) + count(tree.right)
else:
return count(tree.tree)
#usage tree1 = Tree(Branch(Leaf(5), Branch(Leaf(3), Leaf(2))), int)
#usage height(tree1) -> 3
#usage count(tree1) -> 5
The answer using the most modern python idiom is that you need:
- Type annotations
- Python 3.10+ to support structural pattern matching
- Static type checking tools (mypy, pyright) that supports recursive type alias
- Dataclass with
frozen=True
to define immutability
from __future__ import annotations
from dataclasses import dataclass
from typing import Generic, TypeAlias, TypeVar
A = TypeVar("A")
L = TypeVar("L")
R = TypeVar("R")
@dataclass(frozen=True)
class Leaf(Generic[A]):
value: A
@dataclass(frozen=True)
class Branch(Generic[A, L, R]):
root: A
left: L
right: R
Bint: TypeAlias = Leaf[A] | Branch[A, "Bint[A]", "Bint[A]"]
def height(x: Bint[A]) -> int:
match x:
case Leaf():
return 1
case Branch(_, l, r):
return 1 + max(height(l), height(r))
def count(x: Bint[A]) -> int:
match x:
case Leaf():
return 1
case Branch(_, l, r):
return 1 + count(l) + count(r)
And the advantage of this over other answer is that:
- You can drop the runtime type checks (
isinstance
) entirely, to make code read more simple and efficient
- When the union type is used instead of superclassing
Leaf, Branch
, it is possible to do exhaustiveness checking, so if match
does not warn about typing, your program is correct.
In Haskell, I can define a binary tree as follows:
data Bint a = Leaf a | Branch a (Bint a) (Bint a)
then I can some operations on it as follows:
height (Leaf a) = 1
height (Branch a l r) = 1 + (max (height l) (height r))
count (Leaf a) = 1
count (Branch a l r) = 1 + (count l) + (count r)
I know Python doesn’t has equivalent of data
in Haskell. If it has, please tell.
So, how does one define a binary tree in Python and how to implement the above two functions in it?
Python does not have the concept of “data constructor” as Haskell have. You can create classes, like most other OOP languages. Alternatively you can always represent your data type with built-ins and only define the functions to handle them(this is the approach used to implement heaps in the heapq
built-in module).
The differences between python and haskell are huge, so it’s better to avoid making tight comparisons between syntax/features of haskell and python, otherwise you’ll end up writing non-pythonic and inefficient python code.
Even if python does have some functional features, it is not a functional language, so you have to completely change the paradigm of your programs to obtain readable, pythonic and efficient programs.
A possible implementation using classes could be:
class Bint(object):
def __init__(self, value, left=None, right=None):
self.a = value
self.left = left
self.right = right
def height(self):
left_height = self.left.height() if self.left else 0
right_height = self.right.height() if self.right else 0
return 1 + max(left_height, right_height)
def count(self):
left_count = self.left.count() if self.left else 0
right_height = self.right.count() if self.right else 0
return 1 + left_count + right_count
The code may be simplified a bit providing a “smarter” default value for left
and right
:
class Nil(object):
def height(self):
return 0
count = height
nil = Nil()
class Bint(object):
def __init__(self, value, left=nil, right=nil):
self.value = value
self.left = left
self.right = right
def height(self):
return 1 + max(self.left.height(), self.right.height())
def count(self):
return 1 + self.left.count() + self.right.count()
Note that these implementations allow nodes with only one children.
However you do not have to use classes to define a data type.
For example you can say that a Bint
can be a list of a single element, the value of the root, or a list with three elements: the value, the left child and the right child.
In this case you can define the functions as:
def height(bint):
if len(bint) == 1:
return 1
return 1 + max(height(bint[1]), height(bint[2]))
def count(bint):
if len(bint) == 1:
return 1 + count(bint[1]) + count(bint[2])
Yet another approach would be to use namedtuple
s:
from collections import namedtuple
Leaf = namedtuple('Leaf', 'value')
Branch = namedtuple('Branch', 'value left right')
def height(bint):
if len(bint) == 1: # or isinstance(bint, Leaf)
return 1
return 1 + max(height(bint.left), height(bint.right))
def count(bint):
if len(bint) == 1: # or isinstance(bint, Leaf)
return 1 + count(bint.left) + count(bint.right)
The closest thing would probably be classes with methods:
class Leaf:
def __init__(self, value):
self.value = value
def height(self):
return 1
def count(self):
return 1
class Branch:
def __init__(self, left, right):
self.left = left
self.right = right
def height(self):
return 1 + max(self.left.height(), self.right.height())
def count(self):
return 1 + self.left.count() + self.right.count()
While this is somewhat idiomatic and has its own share of upsides, it also lacks some qualities of the Haskell version. Most importantly, the function definitions have to be defined along with the class, in the same module. You can use single-dispatch generic functions instead of methods to get that back. The result is more open-world-y than both methods and Haskell functions, and allows spreading the definition across multiple modules when beneficial.
@singledispatch
def height(_):
raise NotImplementedError()
@singledispatch
def count(_):
raise NotImplementedError()
class Leaf:
def __init__(self, value):
self.value = value
@height.register(Leaf)
def height_leaf(leaf):
return 1
@height.register(Leaf)
def count_leaf(leaf):
return 1
class Branch:
def __init__(self, left, right):
self.left = left
self.right = right
@height.register(Branch)
def height_branch(b):
return 1 + max(b.left.height(), b.right.height())
@count.register(Branch)
def count_branch(b):
return 1 + b.left.count() + b.right.count()
I am going for a close analogue to Haskell an functional programming here. This is not very “pythonic” in a sense. Especially, it’s not object oriented. It’s still useful and clean, though.
A datatype is a class. A datatype with multiple data constructors is a class with extra information about how it is constructed. And of course, it needs some data. Use the constructor to assure that it all trees are legal:
class BinTree (object):
def __init__(self, value=None, left=None, right=None):
if left == None and right == None and value != None:
self.isLeaf = True
self.value = value
elif left != None and right != None and value == None:
self.isLeaf = False
self.value = (left, right)
else:
raise ArgumentError("some help message")
This constructor is a bit inconvenient to call, so have some smart constructors that are easy to use:
def leaf(value):
return BinTree(value=value)
def branch(left, right):
return BinTree(left=left, right=right)
How do we get the values out? Let’s make some helpers for that, too:
def left(tree):
if tree.isLeaf:
raise ArgumentError ("tree is leaf")
else:
return tree.value[0]
def right(tree):
if tree.isLeaf:
raise ArgumentError ("tree is leaf")
else:
return tree.value[1]
def value(tree):
if not tree.isLeaf:
raise ArgumentError ("tree is branch")
else:
return tree.value
That’s it. You got a pure “algebraic” data type which can be accessed with functions:
def count(bin_tree):
if bin_tree.isLeaf:
return 1
else:
return count(left(bin_tree))+count(right(bin_tree))
Five years since last update, but here is my answer.
to make data Tree a = Leaf a | Branch (Tree a) (Tree a) deriving (Show)
to python…
Without type check (more like python)
class Tree:
def __init__(self):
pass
def height(self):
pass
def count(self):
pass
class Leaf(Tree):
def __init__(self, value):
self.value = value
def __str__(self):
return("Leaf " + str(self.value))
def height(self):
return 1
def count(self):
return 1
class Branch(Tree):
def __init__(self, left, right):
if isinstance(left, Tree) and isinstance(right, Tree):
self.left = left
self.right = right
else:
raise ValueError
def __str__(self):
return("Branch (" + str(self.left) + " " +
str(self.right) + ")")
def height(self):
return 1 + max(self.left.height(), self.right.height())
def count(self):
return 1 + self.left.count() + self.right.count()
#usage : Branch(Leaf(5), Branch(Leaf(3), Leaf(2)))
With type check (more like haskell)
height
, count
method at Tree class
class Tree:
def __init__(self, tree, type_):
def typecheck(subtree):
if isinstance(subtree, Leaf):
if not isinstance(subtree.value, type_):
print(subtree.value)
raise ValueError
elif isinstance(subtree, Branch):
typecheck(subtree.left)
typecheck(subtree.right)
else:
raise ValueError
typecheck(tree)
self.tree = tree
self.type_ = type_
def __str__(self):
return ("Tree " + self.type_.__name__ + "n" + str(self.tree))
def height(self):
if isinstance(self, Leaf):
return 1
elif isinstance(self, Branch):
return 1 + max(self.left.height(), self.right.height())
else:
return self.tree.height()
def count(self):
if isinstance(self, Leaf):
return 1
elif isinstance(self, Branch):
return 1 + self.left.count() + self.right.count()
else:
return self.tree.count()
class Leaf(Tree):
def __init__(self, value):
self.value = value
def __str__(self):
return("Leaf " + str(self.value))
class Branch(Tree):
def __init__(self, left, right):
if isinstance(left, Tree) and isinstance(right, Tree):
self.left = left
self.right = right
else:
raise ValueError
def __str__(self):
return("Branch (" + str(self.left) + " " +
str(self.right) + ")")
#usage tree1 = Tree(Branch(Leaf(5), Branch(Leaf(3), Leaf(2))), int)
#usage tree1.height() -> 3
#usage tree1.count() -> 5
height
, count
method at Leaf and Branch class
class Tree:
def __init__(self, tree, type_):
def typecheck(subtree):
if isinstance(subtree, Leaf):
if not isinstance(subtree.value, type_):
print(subtree.value)
raise ValueError
elif isinstance(subtree, Branch):
typecheck(subtree.left)
typecheck(subtree.right)
else:
raise ValueError
typecheck(tree)
self.tree = tree
self.type_ = type_
def __str__(self):
return ("Tree " + self.type_.__name__ + "n" + str(self.tree))
def height(self):
return self.tree.height()
def count(self):
return self.tree.count()
class Leaf(Tree):
def __init__(self, value):
self.value = value
def __str__(self):
return("Leaf " + str(self.value))
def height(self):
return 1
def count(self):
return 1
class Branch(Tree):
def __init__(self, left, right):
if isinstance(left, Tree) and isinstance(right, Tree):
self.left = left
self.right = right
else:
raise ValueError
def __str__(self):
return("Branch (" + str(self.left) + " " +
str(self.right) + ")")
def height(self):
return 1 + max(self.left.height(), self.right.height())
def count(self):
return 1 + self.left.count() + self.right.count()
#usage Tree(Branch(Leaf(5), Branch(Leaf(3), Leaf(2))), int)
#usage tree1.height() -> 3
#usage tree1.count() -> 5
height
, count
method outside classes (most like haskell)
class Tree:
def __init__(self, tree, type_):
def typecheck(subtree):
if isinstance(subtree, Leaf):
if not isinstance(subtree.value, type_):
print(subtree.value)
raise ValueError
elif isinstance(subtree, Branch):
typecheck(subtree.left)
typecheck(subtree.right)
else:
raise ValueError
typecheck(tree)
self.tree = tree
self.type_ = type_
def __str__(self):
return ("Tree " + self.type_.__name__ + "n" + str(self.tree))
class Leaf(Tree):
def __init__(self, value):
self.value = value
def __str__(self):
return("Leaf " + str(self.value))
class Branch(Tree):
def __init__(self, left, right):
if isinstance(left, Tree) and isinstance(right, Tree):
self.left = left
self.right = right
else:
raise ValueError
def __str__(self):
return("Branch (" + str(self.left) + " " +
str(self.right) + ")")
def height(tree):
if not isinstance(tree, Tree):
raise ValueError
if isinstance(tree, Leaf):
return 1
elif isinstance(tree, Branch):
return 1 + max(height(tree.left), height(tree.right))
else:
return height(tree.tree)
def count(tree):
if not isinstance(tree, Tree):
raise ValueError
if isinstance(tree, Leaf):
return 1
elif isinstance(tree, Branch):
return 1 + count(tree.left) + count(tree.right)
else:
return count(tree.tree)
#usage tree1 = Tree(Branch(Leaf(5), Branch(Leaf(3), Leaf(2))), int)
#usage height(tree1) -> 3
#usage count(tree1) -> 5
The answer using the most modern python idiom is that you need:
- Type annotations
- Python 3.10+ to support structural pattern matching
- Static type checking tools (mypy, pyright) that supports recursive type alias
- Dataclass with
frozen=True
to define immutability
from __future__ import annotations
from dataclasses import dataclass
from typing import Generic, TypeAlias, TypeVar
A = TypeVar("A")
L = TypeVar("L")
R = TypeVar("R")
@dataclass(frozen=True)
class Leaf(Generic[A]):
value: A
@dataclass(frozen=True)
class Branch(Generic[A, L, R]):
root: A
left: L
right: R
Bint: TypeAlias = Leaf[A] | Branch[A, "Bint[A]", "Bint[A]"]
def height(x: Bint[A]) -> int:
match x:
case Leaf():
return 1
case Branch(_, l, r):
return 1 + max(height(l), height(r))
def count(x: Bint[A]) -> int:
match x:
case Leaf():
return 1
case Branch(_, l, r):
return 1 + count(l) + count(r)
And the advantage of this over other answer is that:
- You can drop the runtime type checks (
isinstance
) entirely, to make code read more simple and efficient - When the union type is used instead of superclassing
Leaf, Branch
, it is possible to do exhaustiveness checking, so ifmatch
does not warn about typing, your program is correct.