Haskell equivalent of data constructors in Python?

Question:

In Haskell, I can define a binary tree as follows:

data Bint a = Leaf a | Branch a (Bint a) (Bint a) 

then I can some operations on it as follows:

height (Leaf a) = 1
height (Branch a l r) = 1 + (max (height l) (height r))

count (Leaf a) = 1
count (Branch a l r) = 1 + (count l) + (count r) 

I know Python doesn’t has equivalent of data in Haskell. If it has, please tell.

So, how does one define a binary tree in Python and how to implement the above two functions in it?

Asked By: Tem Pora

||

Answers:

Python does not have the concept of “data constructor” as Haskell have. You can create classes, like most other OOP languages. Alternatively you can always represent your data type with built-ins and only define the functions to handle them(this is the approach used to implement heaps in the heapq built-in module).

The differences between python and haskell are huge, so it’s better to avoid making tight comparisons between syntax/features of haskell and python, otherwise you’ll end up writing non-pythonic and inefficient python code.

Even if python does have some functional features, it is not a functional language, so you have to completely change the paradigm of your programs to obtain readable, pythonic and efficient programs.


A possible implementation using classes could be:

class Bint(object):
    def __init__(self, value, left=None, right=None):
        self.a = value
        self.left = left
        self.right = right

    def height(self):
        left_height = self.left.height() if self.left else 0
        right_height = self.right.height() if self.right else 0
        return 1 + max(left_height, right_height)

    def count(self):
        left_count = self.left.count() if self.left else 0
        right_height = self.right.count() if self.right else 0
        return 1 + left_count + right_count

The code may be simplified a bit providing a “smarter” default value for left and right:

class Nil(object):
    def height(self):
        return 0
    count = height

nil = Nil()

class Bint(object):
    def __init__(self, value, left=nil, right=nil):
        self.value = value
        self.left = left
        self.right = right
    def height(self):
        return 1 + max(self.left.height(), self.right.height())
    def count(self):
        return 1 + self.left.count() + self.right.count()

Note that these implementations allow nodes with only one children.

However you do not have to use classes to define a data type.
For example you can say that a Bint can be a list of a single element, the value of the root, or a list with three elements: the value, the left child and the right child.

In this case you can define the functions as:

def height(bint):
    if len(bint) == 1:
        return 1
    return 1 + max(height(bint[1]), height(bint[2]))

def count(bint):
    if len(bint) == 1:
    return 1 + count(bint[1]) + count(bint[2])

Yet another approach would be to use namedtuples:

from collections import namedtuple

Leaf = namedtuple('Leaf', 'value')
Branch = namedtuple('Branch', 'value left right')

def height(bint):
    if len(bint) == 1: # or isinstance(bint, Leaf)
        return 1
    return 1 + max(height(bint.left), height(bint.right))

def count(bint):
    if len(bint) == 1:  # or isinstance(bint, Leaf)
    return 1 + count(bint.left) + count(bint.right)
Answered By: Bakuriu

The closest thing would probably be classes with methods:

class Leaf:
    def __init__(self, value):
        self.value = value

    def height(self):
        return 1

    def count(self):
        return 1


class Branch:
    def __init__(self, left, right):
        self.left = left
        self.right = right

    def height(self):
        return 1 + max(self.left.height(), self.right.height())

    def count(self):
        return 1 + self.left.count() + self.right.count()

While this is somewhat idiomatic and has its own share of upsides, it also lacks some qualities of the Haskell version. Most importantly, the function definitions have to be defined along with the class, in the same module. You can use single-dispatch generic functions instead of methods to get that back. The result is more open-world-y than both methods and Haskell functions, and allows spreading the definition across multiple modules when beneficial.

@singledispatch
def height(_):
    raise NotImplementedError()

@singledispatch
def count(_):
    raise NotImplementedError()

class Leaf:
    def __init__(self, value):
        self.value = value

@height.register(Leaf)
def height_leaf(leaf):
    return 1

@height.register(Leaf)
def count_leaf(leaf):
    return 1    

class Branch:
    def __init__(self, left, right):
        self.left = left
        self.right = right

@height.register(Branch)
def height_branch(b):
    return 1 + max(b.left.height(), b.right.height())

@count.register(Branch)
def count_branch(b):
    return 1 + b.left.count() + b.right.count()
Answered By: user395760

I am going for a close analogue to Haskell an functional programming here. This is not very “pythonic” in a sense. Especially, it’s not object oriented. It’s still useful and clean, though.

A datatype is a class. A datatype with multiple data constructors is a class with extra information about how it is constructed. And of course, it needs some data. Use the constructor to assure that it all trees are legal:

class BinTree (object):
    def __init__(self, value=None, left=None, right=None):
        if left == None and right == None and value != None:                
            self.isLeaf = True
            self.value = value
        elif left != None and right != None and value == None:
            self.isLeaf = False
            self.value = (left, right)
        else:
            raise ArgumentError("some help message")

This constructor is a bit inconvenient to call, so have some smart constructors that are easy to use:

def leaf(value):
    return BinTree(value=value)

def branch(left, right):
    return BinTree(left=left, right=right)

How do we get the values out? Let’s make some helpers for that, too:

def left(tree):
    if tree.isLeaf:
        raise ArgumentError ("tree is leaf")
    else:
        return tree.value[0]

def right(tree):
    if tree.isLeaf:
        raise ArgumentError ("tree is leaf")
    else:
        return tree.value[1]

def value(tree):
    if not tree.isLeaf:
        raise ArgumentError ("tree is branch")
    else:
        return tree.value

That’s it. You got a pure “algebraic” data type which can be accessed with functions:

def count(bin_tree):
    if bin_tree.isLeaf:
        return 1
    else:
        return count(left(bin_tree))+count(right(bin_tree))
Answered By: firefrorefiddle

Five years since last update, but here is my answer.

to make data Tree a = Leaf a | Branch (Tree a) (Tree a) deriving (Show) to python…

Without type check (more like python)

class Tree:
    def __init__(self):
        pass

    def height(self):
        pass

    def count(self):
        pass

class Leaf(Tree):
    def __init__(self, value):
        self.value = value

    def __str__(self):
        return("Leaf " + str(self.value))

    def height(self):
        return 1

    def count(self):
        return 1

class Branch(Tree):
    def __init__(self, left, right):
        if isinstance(left, Tree) and isinstance(right, Tree):
            self.left = left
            self.right = right
        else:
            raise ValueError

    def __str__(self):
        return("Branch (" + str(self.left) + " " +
               str(self.right) + ")")

    def height(self):
        return 1 + max(self.left.height(), self.right.height())

    def count(self):
        return 1 + self.left.count() + self.right.count()


#usage : Branch(Leaf(5), Branch(Leaf(3), Leaf(2)))

With type check (more like haskell)

height, count method at Tree class

class Tree:
    def __init__(self, tree, type_):
        def typecheck(subtree):
            if isinstance(subtree, Leaf):
                if not isinstance(subtree.value, type_):
                    print(subtree.value)
                    raise ValueError
            elif isinstance(subtree, Branch):
                typecheck(subtree.left)
                typecheck(subtree.right)
            else:
                raise ValueError
        typecheck(tree)
        self.tree = tree
        self.type_ = type_

    def __str__(self):
        return ("Tree " + self.type_.__name__ + "n" + str(self.tree))

    def height(self):
        if isinstance(self, Leaf):
            return 1
        elif isinstance(self, Branch):
            return 1 + max(self.left.height(), self.right.height())
        else:
            return self.tree.height()

    def count(self):
        if isinstance(self, Leaf):
            return 1
        elif isinstance(self, Branch):
            return 1 + self.left.count() + self.right.count()
        else:
            return self.tree.count()


class Leaf(Tree):
    def __init__(self, value):
            self.value = value

    def __str__(self):
            return("Leaf " + str(self.value))


class Branch(Tree):
    def __init__(self, left, right):
        if isinstance(left, Tree) and isinstance(right, Tree):
            self.left = left
            self.right = right
        else:
            raise ValueError

    def __str__(self):
        return("Branch (" + str(self.left) + " " +
               str(self.right) + ")")


#usage tree1 = Tree(Branch(Leaf(5), Branch(Leaf(3), Leaf(2))), int)
#usage tree1.height() -> 3
#usage tree1.count() -> 5

height, count method at Leaf and Branch class

class Tree:
    def __init__(self, tree, type_):
        def typecheck(subtree):
            if isinstance(subtree, Leaf):
                if not isinstance(subtree.value, type_):
                    print(subtree.value)
                    raise ValueError
            elif isinstance(subtree, Branch):
                typecheck(subtree.left)
                typecheck(subtree.right)
            else:
                raise ValueError
        typecheck(tree)
        self.tree = tree
        self.type_ = type_

    def __str__(self):
        return ("Tree " + self.type_.__name__ + "n" + str(self.tree))

    def height(self):
        return self.tree.height()

    def count(self):
        return self.tree.count()


class Leaf(Tree):
    def __init__(self, value):
            self.value = value

    def __str__(self):
            return("Leaf " + str(self.value))

    def height(self):
        return 1

    def count(self):
        return 1


class Branch(Tree):
    def __init__(self, left, right):
        if isinstance(left, Tree) and isinstance(right, Tree):
            self.left = left
            self.right = right
        else:
            raise ValueError

    def __str__(self):
        return("Branch (" + str(self.left) + " " +
               str(self.right) + ")")

    def height(self):
        return 1 + max(self.left.height(), self.right.height())

    def count(self):
        return 1 + self.left.count() + self.right.count()

#usage Tree(Branch(Leaf(5), Branch(Leaf(3), Leaf(2))), int)
#usage tree1.height() -> 3
#usage tree1.count() -> 5

height, count method outside classes (most like haskell)

class Tree:
    def __init__(self, tree, type_):
        def typecheck(subtree):
            if isinstance(subtree, Leaf):
                if not isinstance(subtree.value, type_):
                    print(subtree.value)
                    raise ValueError
            elif isinstance(subtree, Branch):
                typecheck(subtree.left)
                typecheck(subtree.right)
            else:
                raise ValueError
        typecheck(tree)
        self.tree = tree
        self.type_ = type_

    def __str__(self):
        return ("Tree " + self.type_.__name__ + "n" + str(self.tree))


class Leaf(Tree):
    def __init__(self, value):
            self.value = value

    def __str__(self):
            return("Leaf " + str(self.value))


class Branch(Tree):
    def __init__(self, left, right):
        if isinstance(left, Tree) and isinstance(right, Tree):
            self.left = left
            self.right = right
        else:
            raise ValueError

    def __str__(self):
        return("Branch (" + str(self.left) + " " +
               str(self.right) + ")")

def height(tree):
    if not isinstance(tree, Tree):
        raise ValueError
    if isinstance(tree, Leaf):
        return 1
    elif isinstance(tree, Branch):
        return 1 + max(height(tree.left), height(tree.right))
    else:
        return height(tree.tree)

def count(tree):
    if not isinstance(tree, Tree):
        raise ValueError
    if isinstance(tree, Leaf):
        return 1
    elif isinstance(tree, Branch):
        return 1 + count(tree.left) + count(tree.right)
    else:
        return count(tree.tree)

#usage tree1 = Tree(Branch(Leaf(5), Branch(Leaf(3), Leaf(2))), int)
#usage height(tree1) -> 3
#usage count(tree1) -> 5
Answered By: LegenDUST

The answer using the most modern python idiom is that you need:

  • Type annotations
  • Python 3.10+ to support structural pattern matching
  • Static type checking tools (mypy, pyright) that supports recursive type alias
  • Dataclass with frozen=True to define immutability
from __future__ import annotations

from dataclasses import dataclass
from typing import Generic, TypeAlias, TypeVar

A = TypeVar("A")
L = TypeVar("L")
R = TypeVar("R")


@dataclass(frozen=True)
class Leaf(Generic[A]):
    value: A


@dataclass(frozen=True)
class Branch(Generic[A, L, R]):
    root: A
    left: L
    right: R


Bint: TypeAlias = Leaf[A] | Branch[A, "Bint[A]", "Bint[A]"]


def height(x: Bint[A]) -> int:
    match x:
        case Leaf():
            return 1
        case Branch(_, l, r):
            return 1 + max(height(l), height(r))


def count(x: Bint[A]) -> int:
    match x:
        case Leaf():
            return 1
        case Branch(_, l, r):
            return 1 + count(l) + count(r)

And the advantage of this over other answer is that:

  • You can drop the runtime type checks (isinstance) entirely, to make code read more simple and efficient
  • When the union type is used instead of superclassing Leaf, Branch, it is possible to do exhaustiveness checking, so if match does not warn about typing, your program is correct.
Answered By: S.Y. Lee