How to implement a Median-heap

Question:

Like a Max-heap and Min-heap, I want to implement a Median-heap to keep track of the median of a given set of integers. The API should have the following three functions:

insert(int)  // should take O(logN)
int median() // will be the topmost element of the heap. O(1)
int delmedian() // should take O(logN)

I want to use an array (a) implementation to implement the heap where the children of array index k are stored in array indices 2*k and 2*k + 1. For convenience, the array starts populating elements from index 1.
This is what I have so far:
The Median-heap will have two integers to keep track of number of integers inserted so far that are > current median (gcm) and < current median (lcm).

if abs(gcm-lcm) >= 2 and gcm > lcm we need to swap a[1] with one of its children. 
The child chosen should be greater than a[1]. If both are greater, 
choose the smaller of two.

Similarly for the other case. I can’t come up with an algorithm for how to sink and swim elements. I think it should take into consideration how close the number is to the median, so something like:

private void swim(int k) {
    while (k > 1 && absless(k, k/2)) {   
        exch(k, k/2);
        k = k/2;
    }
}

I can’t come up with the entire solution though.

Asked By: Bruce

||

Answers:

You need two heaps: one min-heap and one max-heap. Each heap contains about one half of the data. Every element in the min-heap is greater or equal to the median, and every element in the max-heap is less or equal to the median.

When the min-heap contains one more element than the max-heap, the median is in the top of the min-heap. And when the max-heap contains one more element than the min-heap, the median is in the top of the max-heap.

When both heaps contain the same number of elements, the total number of elements is even.
In this case you have to choose according your definition of median: a) the mean of the two middle elements; b) the greater of the two; c) the lesser; d) choose at random any of the two…

Every time you insert, compare the new element with those at the top of the heaps in order to decide where to insert it. If the new element is greater than the current median, it goes to the min-heap. If it is less than the current median, it goes to the max heap. Then you might need to rebalance. If the sizes of the heaps differ by more than one element, extract the min/max from the heap with more elements and insert it into the other heap.

In order to construct the median heap for a list of elements, we should first use a linear time algorithm and find the median. Once the median is known, we can simply add elements to the Min-heap and Max-heap based on the median value. Balancing the heaps isn’t required because the median will split the input list of elements into equal halves.

If you extract an element you might need to compensate the size change by moving one element from one heap to another. This way you ensure that, at all times, both heaps have the same size or differ by just one element.

Answered By: comocomocomocomo

Isn’t a perfectly balanced binary search tree (BST) a median heap? It is true that even red-black BSTs aren’t always perfectly balanced, but it might be close enough for your purposes. And log(n) performance is guaranteed!

AVL trees are more tighly balanced than red-black BSTs so they come even closer to being a true median heap.

Answered By: angelatlarge

Here is a java implementaion of a MedianHeap, developed with the help of above comocomocomocomo ‘s explanation .

import java.util.Arrays;
import java.util.Comparator;
import java.util.PriorityQueue;
import java.util.Scanner;

/**
 *
 * @author BatmanLost
 */
public class MedianHeap {

    //stores all the numbers less than the current median in a maxheap, i.e median is the maximum, at the root
    private PriorityQueue<Integer> maxheap;
    //stores all the numbers greater than the current median in a minheap, i.e median is the minimum, at the root
    private PriorityQueue<Integer> minheap;

    //comparators for PriorityQueue
    private static final maxHeapComparator myMaxHeapComparator = new maxHeapComparator();
    private static final minHeapComparator myMinHeapComparator = new minHeapComparator();

    /**
     * Comparator for the minHeap, smallest number has the highest priority, natural ordering
     */
    private static class minHeapComparator implements Comparator<Integer>{
        @Override
        public int compare(Integer i, Integer j) {
            return i>j ? 1 : i==j ? 0 : -1 ;
        }
    }

    /**
     * Comparator for the maxHeap, largest number has the highest priority
     */
    private static  class maxHeapComparator implements Comparator<Integer>{
        // opposite to minHeapComparator, invert the return values
        @Override
        public int compare(Integer i, Integer j) {
            return i>j ? -1 : i==j ? 0 : 1 ;
        }
    }

    /**
     * Constructor for a MedianHeap, to dynamically generate median.
     */
    public MedianHeap(){
        // initialize maxheap and minheap with appropriate comparators
        maxheap = new PriorityQueue<Integer>(11,myMaxHeapComparator);
        minheap = new PriorityQueue<Integer>(11,myMinHeapComparator);
    }

    /**
     * Returns empty if no median i.e, no input
     * @return
     */
    private boolean isEmpty(){
        return maxheap.size() == 0 && minheap.size() == 0 ;
    }

    /**
     * Inserts into MedianHeap to update the median accordingly
     * @param n
     */
    public void insert(int n){
        // initialize if empty
        if(isEmpty()){ minheap.add(n);}
        else{
            //add to the appropriate heap
            // if n is less than or equal to current median, add to maxheap
            if(Double.compare(n, median()) <= 0){maxheap.add(n);}
            // if n is greater than current median, add to min heap
            else{minheap.add(n);}
        }
        // fix the chaos, if any imbalance occurs in the heap sizes
        //i.e, absolute difference of sizes is greater than one.
        fixChaos();
    }

    /**
     * Re-balances the heap sizes
     */
    private void fixChaos(){
        //if sizes of heaps differ by 2, then it's a chaos, since median must be the middle element
        if( Math.abs( maxheap.size() - minheap.size()) > 1){
            //check which one is the culprit and take action by kicking out the root from culprit into victim
            if(maxheap.size() > minheap.size()){
                minheap.add(maxheap.poll());
            }
            else{ maxheap.add(minheap.poll());}
        }
    }
    /**
     * returns the median of the numbers encountered so far
     * @return
     */
    public double median(){
        //if total size(no. of elements entered) is even, then median iss the average of the 2 middle elements
        //i.e, average of the root's of the heaps.
        if( maxheap.size() == minheap.size()) {
            return ((double)maxheap.peek() + (double)minheap.peek())/2 ;
        }
        //else median is middle element, i.e, root of the heap with one element more
        else if (maxheap.size() > minheap.size()){ return (double)maxheap.peek();}
        else{ return (double)minheap.peek();}

    }
    /**
     * String representation of the numbers and median
     * @return 
     */
    public String toString(){
        StringBuilder sb = new StringBuilder();
        sb.append("n Median for the numbers : " );
        for(int i: maxheap){sb.append(" "+i); }
        for(int i: minheap){sb.append(" "+i); }
        sb.append(" is " + median()+"n");
        return sb.toString();
    }

    /**
     * Adds all the array elements and returns the median.
     * @param array
     * @return
     */
    public double addArray(int[] array){
        for(int i=0; i<array.length ;i++){
            insert(array[i]);
        }
        return median();
    }

    /**
     * Just a test
     * @param N
     */
    public void test(int N){
        int[] array = InputGenerator.randomArray(N);
        System.out.println("Input array: n"+Arrays.toString(array));
        addArray(array);
        System.out.println("Computed Median is :" + median());
        Arrays.sort(array);
        System.out.println("Sorted array: n"+Arrays.toString(array));
        if(N%2==0){ System.out.println("Calculated Median is :" + (array[N/2] + array[(N/2)-1])/2.0);}
        else{System.out.println("Calculated Median is :" + array[N/2] +"n");}
    }

    /**
     * Another testing utility
     */
    public void printInternal(){
        System.out.println("Less than median, max heap:" + maxheap);
        System.out.println("Greater than median, min heap:" + minheap);
    }

    //Inner class to generate input for basic testing
    private static class InputGenerator {

        public static int[] orderedArray(int N){
            int[] array = new int[N];
            for(int i=0; i<N; i++){
                array[i] = i;
            }
            return array;
        }

        public static int[] randomArray(int N){
            int[] array = new int[N];
            for(int i=0; i<N; i++){
                array[i] = (int)(Math.random()*N*N);
            }
            return array;
        }

        public static int readInt(String s){
            System.out.println(s);
            Scanner sc = new Scanner(System.in);
            return sc.nextInt();
        }
    }

    public static void main(String[] args){
        System.out.println("You got to stop the program MANUALLY!!");        
        while(true){
            MedianHeap testObj = new MedianHeap();
            testObj.test(InputGenerator.readInt("Enter size of the array:"));
            System.out.println(testObj);
        }
    }
}
Answered By: Charan

Here is a Scala implementation, following the comocomocomocomo’s idea above.

class MedianHeap(val capacity:Int) {
    private val minHeap = new PriorityQueue[Int](capacity / 2)
    private val maxHeap = new PriorityQueue[Int](capacity / 2, new Comparator[Int] {
      override def compare(o1: Int, o2: Int): Int = Integer.compare(o2, o1)
    })

    def add(x: Int): Unit = {
      if (x > median) {
        minHeap.add(x)
      } else {
        maxHeap.add(x)
      }

      // Re-balance the heaps.
      if (minHeap.size - maxHeap.size > 1) {
        maxHeap.add(minHeap.poll())
      }
      if (maxHeap.size - minHeap.size > 1) {
        minHeap.add(maxHeap.poll)
      }
    }

    def median: Double = {
      if (minHeap.isEmpty && maxHeap.isEmpty)
        return Int.MinValue
      if (minHeap.size == maxHeap.size) {
        return (minHeap.peek+ maxHeap.peek) / 2.0
      }
      if (minHeap.size > maxHeap.size) {
        return minHeap.peek()
      }
      maxHeap.peek
    }
  }
Answered By: Duong Nguyen

Here my code based on the answer provided by comocomocomocomo :

import java.util.PriorityQueue;

public class Median {
private  PriorityQueue<Integer> minHeap = 
    new PriorityQueue<Integer>();
private  PriorityQueue<Integer> maxHeap = 
    new PriorityQueue<Integer>((o1,o2)-> o2-o1);

public float median() {
    int minSize = minHeap.size();
    int maxSize = maxHeap.size();
    if (minSize == 0 && maxSize == 0) {
        return 0;
    }
    if (minSize > maxSize) {
        return minHeap.peek();
    }if (minSize < maxSize) {
        return maxHeap.peek();
    }
    return (minHeap.peek()+maxHeap.peek())/2F;
}

public void insert(int element) {
    float median = median();
    if (element > median) {
        minHeap.offer(element);
    } else {
        maxHeap.offer(element);
    }
    balanceHeap();
}

private void balanceHeap() {
    int minSize = minHeap.size();
    int maxSize = maxHeap.size();
    int tmp = 0;
    if (minSize > maxSize + 1) {
        tmp = minHeap.poll();
        maxHeap.offer(tmp);
    }
    if (maxSize > minSize + 1) {
        tmp = maxHeap.poll();
        minHeap.offer(tmp);
    }
  }
}
Answered By: Enrico Giurin

Another way to do it without using a max-heap and a min-heap would be to use a median-heap right away.

In a max-heap, the parent is greater than the children.
We can have a new type of heap where the parent is in the ‘middle’ of the children – the left child is smaller than the parent and the right child is greater than the parent. All even entries are left children and all odd entries are right children.

The same swim and sink operations which can be performed in a max-heap, can also be performed in this median-heap – with slight modifications. In a typical swim operation in a max-heap, the inserted entry swims up till it is smaller than a parent entry, here in a median-heap, it will swim up till it is lesser than a parent (if it is an odd entry) or greater than a parent (if it is an even entry).

Here’s my implementation for this median-heap. I have used an array of Integers for simplicity.

package priorityQueues;
import edu.princeton.cs.algs4.StdOut;

public class MedianInsertDelete {

    private Integer[] a;
    private int N;

    public MedianInsertDelete(int capacity){

        // accounts for '0' not being used
        this.a = new Integer[capacity+1]; 
        this.N = 0;
    }

    public void insert(int k){

        a[++N] = k;
        swim(N);
    }

    public int delMedian(){

        int median = findMedian();
        exch(1, N--);
        sink(1);
        a[N+1] = null;
        return median;

    }

    public int findMedian(){

        return a[1];


    }

    // entry swims up so that its left child is smaller and right is greater

    private void swim(int k){


        while(even(k) && k>1 && less(k/2,k)){

            exch(k, k/2);

            if ((N > k) && less (k+1, k/2)) exch(k+1, k/2);
            k = k/2;
        }

        while(!even(k) && (k>1 && !less(k/2,k))){

            exch(k, k/2);
            if (!less (k-1, k/2)) exch(k-1, k/2);
            k = k/2;
        }

    }

// if the left child is larger or if the right child is smaller, the entry sinks down
    private void sink (int k){

        while(2*k <= N){
            int j = 2*k;
            if (j < N && less (j, k)) j++;
            if (less(k,j)) break;
            exch(k, j);
            k = j;
        }

    }

    private boolean even(int i){

        if ((i%2) == 0) return true;
        else return false;
    }

    private void exch(int i, int j){

        int temp = a[i];
        a[i] = a[j];
        a[j] = temp;
    }

    private boolean less(int i, int j){

        if (a[i] <= a[j]) return true;
        else return false;
    }


    public static void main(String[] args) {

        MedianInsertDelete medianInsertDelete = new MedianInsertDelete(10);

        for(int i = 1; i <=10; i++){

            medianInsertDelete.insert(i);
        }

        StdOut.println("The median is: " + medianInsertDelete.findMedian());

        medianInsertDelete.delMedian();


        StdOut.println("Original median deleted. The new median is " + medianInsertDelete.findMedian());




    }
}