Welcome back. This is part two of three in my “Sorting algorithms in Ruby” series. If you haven’t checked out the first one, by all means go take a quick peek: Sorting algorithms in Ruby, Part 1.
By the way, constructive comments are always welcome!
Without further ado here is…
Heapsort is a great recursive sort algorithm. It starts by creating a heap, which is a data structure (more on that shortly).
The heap is created recursively and then used by the sort algorithm to create the new sorted array.
The algorithm has a time complexity of O (n log n). The algorithm can be implemented out of place or in-place. The latter being a little more trickier but requiring less memory to perform the sort. I went with the out of place approach in this post.
While quicksort is generally faster than heapsort, heapsort has a better worst case scenario and requires relatively few memory.
To use heapsort you must first build a heap. I had to go back and read about heaps since I had forgotten most of the details from my university computer science classes. Once I did though, everything started coming back pretty quickly. Heaps are a data structure, more precisely a binary (or ternary) tree structure. Not all trees are heaps, but all heaps are trees (binary trees actually).
A tree is a data structure where nodes have a parent (except for the root node) and can have 0 to n children nodes. In a binary tree, the nodes have between 0 to 2 children.
To qualify as a heap, a binary tree must follow these two rules:
- For a node B with parent node A. The key of B is smaller than the key of A.
- For all levels of the tree except the last one, all levels must be fully-filled. If the last level isn’t fully filled, it must filled from left to right.
Here are some illustrations of these data structures.
Heaps are typically represented in a linear array. A simple function using a formula is used to find the child and parent nodes of a node in array position i.
If the array has a zero-based index, the children of the node at index i are at:
- index = 2 * i + 1
- index = 2 * i + 2
To find the parent of a given node, we can use the formula:
- index = floor((i – 1) / 2)
Which can be implemented in Ruby as : heap[((i – 1) / 2).floor]
Using an array seems counter intuitive at first. When heapsort was first designed, object oriented programming was still in embryonic stages. Also, using clever tricks to keep data storage space at a minimum was the norm. Even today, using an array to store the heap is a good solution on embedded systems with low memory.
To someone used to think in object-oriented principles, like myself, the heap begs to be implemented as a class. For this blog entry I decided to try out both approaches. I will present the more classical array based method. The article initially contained an object-oriented approach but as an astute reader commented the implementation was flawed.
Looking into it I noticed implementing heapsort was quicker and made more sense in the traditional approach.
I started by writing the heap code before the sort algorithm. I began by writing a first test, making it pass and then writing a second test case. Here is my test class:
require 'test/unit' require './heap.rb' class HeapTest < Test::Unit::TestCase @@unsorted = [6, 3, 5, 7, 1, 2, 8, 39, 23, 12, 34, 11, 32, 4, 13] # This is an already generated heap, verified by hand. @@generated_heap = [39, 34, 32, 8, 23, 11, 13, 3, 6, 1, 12, 2, 7, 4, 5] def test_heapify_will_put_greatest_number_at_root_node test_heap = Heap.new test_heap.heapify(@@unsorted) greatest_number = @@unsorted.max assert_equal greatest_number, test_heap.heap.first end # test for heap property def test_heapify_heap_will_follow_heap_property # Heap property: child nodes have keys which are smaller or equal to their parent node's keys test_heap = Heap.new test_heap.heapify(@@unsorted) (test_heap.heap.length - 1).downto(1) do |i| node_value = test_heap.heap[i] parent_node_value = test_heap.heap[((i - 1) / 2).floor] assert_operator node_value, :<=, parent_node_value end end def test_heapify_heap_is_equal_to_manually_calculated_heap test_heap = Heap.new test_heap.heapify(@@unsorted) assert_equal @@generated_heap, test_heap.heap end def test_remove_removes_and_replaces_root_node test_heap = Heap.new test_heap.heapify(@@unsorted) test_heap.remove() assert_not_equal @@generated_heap.first, test_heap.heap.first assert_equal @@generated_heap.length - 1, test_heap.heap.length end def test_remove_resulting_heap_will_have_34_as_root_node test_heap = Heap.new test_heap.heapify(@@unsorted) test_heap.remove() assert_equal 34, test_heap.heap.first end def test_remove_resulting_heap_satisfies_heap_property # Heap property: child nodes have keys which are smaller or equal to their parent node's keys test_heap = Heap.new test_heap.heapify(@@unsorted) test_heap.remove() (test_heap.heap.length - 1).downto(1) do |i| node_value = test_heap.heap[i] parent_node_value = test_heap.heap[((i - 1) / 2).floor] assert_operator node_value, :<=, parent_node_value end end def test_empty_will_return_false_if_heap_contains_elements test_heap = Heap.new test_heap.heapify(@@unsorted) result = test_heap.empty?() assert !result end def test_empty_will_return_true_if_heap_contains_no_elements test_heap = Heap.new test_heap.heapify() test_heap.heap.delete_at(0) result = test_heap.empty? assert result end end
Here is the source code for my heap implementation. This implementation builds a heap contained in an array.
class Heap attr_accessor :heap def initialize() @heap =  end # Creates a max-heap def heapify(data) data.each do |element| @heap.push element up_heap(@heap.length - 1) end end def remove() @heap.delete_at(0) if not empty? last_element = @heap.pop @heap.insert(0, last_element) down_heap(0) end end def empty?() return false if @heap.length > 0 return true end private # method to restore heap property after adding an element to a heap def up_heap(index) if parent_exists?(index) parent_index = ((index - 1) / 2).floor if @heap[parent_index] < @heap[index] swap(index, parent_index) up_heap(parent_index) end end end # method to restore heap property after removing an element from the heap def down_heap(index) if smaller_than_children?(index) index_of_greatest_child = index_of_greatest_child?(index) swap(index, index_of_greatest_child) down_heap(index_of_greatest_child) end end def swap(first_index, second_index) @heap[first_index], @heap[second_index] = @heap[second_index], @heap[first_index] end def parent_exists?(index) return index != 0 end def smaller_than_children?(index) if @heap.length < 2 * index + 2 # element doesn't have any children return false elsif @heap.length == 2 * index + 2 # element only has one child return @heap[index] < @heap[2 * index + 1] else # element has two children return @heap[index] < @heap[2 * index + 1] || @heap[index] < @heap[2 * index + 2] end end # This method assumes there is at least one child for the specified index. def index_of_greatest_child?(index) # We only have one child, it is thus the greatest return 2 * index + 1 if @heap.length == 2 * index + 2 # We have two children if @heap[2 * index + 1] > @heap[2 * index + 2] return 2 * index + 1 else return 2 * index + 2 end end end
The heap class contains the array which is accessible through an accessor.
The heapify method will take an unsorted array and will store it as a heap in the array. The implementation creates a max-heap, meaning a heap where the root node is the greatest element in the heap. The algorithm could also be created with a min-heap.
Remove, on the other hand will remove the first element on a heap (to be used by the sorting algorithm) and then recreate the heap using the recursive down_heap method.
Now for the sort algorithm and it’s test…
For the tests, I decided to create a reusable SortTest class. Heapsort was more complex to implement than my earlier sorts. Because of this, I decided to have a much more rigorous test suite. Since I’m feeling confident I will be able to use the same test cases for quicksort, I decided to put them away in a reusable SortTest class.
require 'test/unit' require './heap_sort.rb' require './sort_test.rb' class HeapSortTest < Test::Unit::TestCase def setup heap_sort = HeapSort.new @sort_test = SortTest.new(heap_sort) end def test_empty_array assert @sort_test.test_empty_array, "Empty array not corectly sorted" end def test_array_with_one_element assert @sort_test.test_array_with_one_element, "Array with one element not correctly sorted" end def test_ordered_array assert @sort_test.test_ordered_array, "Ordered array not correctly sorted" end def test_array_with_even_number_of_elements assert @sort_test.test_array_with_even_number_of_elements, "Array with even number of elements not correctly sorted" end def test_array_with_odd_number_of_elements assert @sort_test.test_array_with_odd_number_of_elements, "Array with odd number of elements not correctly sorted" end def test_medium_sized_array assert @sort_test.test_medium_sized_array, "Medium sized array not correctly sorted" end def test_reversed_sorted_array assert @sort_test.test_reversed_sorted_array, "Reversed sorted array not correctly sorted" end def test_large_size_array assert @sort_test.test_large_sized_array, "Large sized array not correctly sorted" end def test_test_array_with_repeating_elements assert @sort_test.test_array_with_repeating_elements, "Array with repeating elements not correctly sorted" end end
require './heap.rb' class HeapSort def sort(to_sort) heap_to_sort = Heap.new heap_to_sort.heapify(to_sort) sorted_array =  until heap_to_sort.empty? sorted_array.insert(0, heap_to_sort.heap.first) heap_to_sort.remove() end return sorted_array end end
As you can see, most of the logic is in the heap class. The out of place sort algorithm is dead simple. Since the root node of a max-heap will always be it’s greatest element, we iterate over the heap taking the largest element and placing it in our new array.
Then we just have to remove the first element and make sure we reconstruct the heap correctly. After which we start over again.
That is all for now. See you next time for quicksort.