Sorting algorithms in Ruby, Part 2

Welcome back. This is part two of three in my “Sorting algorithms in Ruby” series. If you haven’t checked out the first one, by all means go take a quick peek: Sorting algorithms in Ruby, Part 1.

By the way, constructive comments are always welcome!

Without further ado here is…

Heapsort

Heapsort is a great recursive sort algorithm. It starts by creating a heap, which is a data structure (more on that shortly).

The heap is created recursively and then used by the sort algorithm to create the new sorted array.

The algorithm has a time complexity of O (n log n). The algorithm can be implemented out of place or in-place. The latter being a little more trickier but requiring less memory to perform the sort. I went with the out of place approach in this post.

While quicksort is generally faster than heapsort, heapsort has a better worst case scenario and requires relatively few memory.

To use heapsort you must first build a heap. I had to go back and read about heaps since I had forgotten most of the details from my university computer science classes. Once I did though, everything started coming back pretty quickly. Heaps are a data structure, more precisely a binary (or ternary) tree structure. Not all trees are heaps, but all heaps are trees (binary trees actually).

A tree is a data structure where nodes have a parent (except for the root node) and can have 0 to n children nodes. In a binary tree, the nodes have between 0 to 2 children.

To qualify as a heap, a binary tree must follow these two rules:

  • For a node B with parent node A. The key of B is smaller than the key of A.
  • For all levels of the tree except the last one, all levels must be fully-filled. If the last level isn’t fully filled, it must filled from left to right.

Here are some illustrations of these data structures.

Heaps are typically represented in a linear array. A simple function using a formula is used to find the child and parent nodes of a node in array position i.

If the array has a zero-based index, the children of the node at index i are at:

    index = 2 * i + 1
    index = 2 * i + 2

To find the parent of a given node, we can use the formula:

    index = floor((i – 1) / 2)

Which can be implemented in Ruby as : heap[((i – 1) / 2).floor]

Using an array seems counter intuitive at first. When heapsort was first designed, object oriented programming was still in embryonic stages. Also, using clever tricks to keep data storage space at a minimum was the norm. Even today, using an array to store the heap is a good solution on embedded systems with low memory.

To someone used to think in object-oriented principles, like myself, the heap begs to be implemented as a class. For this blog entry I decided to try out both approaches. I will present the more classical array based method. The article initially contained an object-oriented approach but as an astute reader commented the implementation was flawed.

Looking into it I noticed implementing heapsort was quicker and made more sense in the traditional approach.

I started by writing the heap code before the sort algorithm. I began by writing a first test, making it pass and then writing a second test case. Here is my test class:

require 'test/unit'
require './heap.rb'

class HeapTest < Test::Unit::TestCase
	@@unsorted = [6, 3, 5, 7, 1, 2, 8, 39, 23, 12, 34, 11, 32, 4, 13]
	# This is an already generated heap, verified by hand.
	@@generated_heap = [39, 34, 32, 8, 23, 11, 13, 3, 6, 1, 12, 2, 7, 4, 5]
	
	def test_heapify_will_put_greatest_number_at_root_node
		test_heap = Heap.new
		test_heap.heapify(@@unsorted)
		greatest_number = @@unsorted.max
		
		assert_equal greatest_number, test_heap.heap.first
	end
	
	# test for heap property
	def test_heapify_heap_will_follow_heap_property
		# Heap property: child nodes have keys which are smaller or equal to their parent node's keys
		test_heap = Heap.new
		test_heap.heapify(@@unsorted)
		
		(test_heap.heap.length - 1).downto(1) do |i|
			node_value = test_heap.heap[i]
			parent_node_value = test_heap.heap[((i - 1) / 2).floor]
			
		    assert_operator node_value, :<=, parent_node_value 
		end
	end
	
	def test_heapify_heap_is_equal_to_manually_calculated_heap
		test_heap = Heap.new
		test_heap.heapify(@@unsorted)
		
		assert_equal @@generated_heap, test_heap.heap
	end
	
	def test_remove_removes_and_replaces_root_node
		test_heap = Heap.new
		test_heap.heapify(@@unsorted)
		
		test_heap.remove()
		
		assert_not_equal @@generated_heap.first, test_heap.heap.first
		assert_equal @@generated_heap.length - 1, test_heap.heap.length
	end
	
	def test_remove_resulting_heap_will_have_34_as_root_node
		test_heap = Heap.new
		test_heap.heapify(@@unsorted)
		
		test_heap.remove()
		
		assert_equal 34, test_heap.heap.first
	end
	
	def test_remove_resulting_heap_satisfies_heap_property
		# Heap property: child nodes have keys which are smaller or equal to their parent node's keys
		test_heap = Heap.new
		test_heap.heapify(@@unsorted)
		
		test_heap.remove()
		
		(test_heap.heap.length - 1).downto(1) do |i|
			node_value = test_heap.heap[i]
			parent_node_value = test_heap.heap[((i - 1) / 2).floor]
			
		    assert_operator node_value, :<=, parent_node_value 
		end
	end
	
	def test_empty_will_return_false_if_heap_contains_elements
		test_heap = Heap.new
		test_heap.heapify(@@unsorted)
		
		result = test_heap.empty?()
		
		assert !result
	end
	
	def test_empty_will_return_true_if_heap_contains_no_elements
		test_heap = Heap.new
		test_heap.heapify([1])
		test_heap.heap.delete_at(0)
		
		result = test_heap.empty?
		
		assert result
	end
end

Here is the source code for my heap implementation. This implementation builds a heap contained in an array.

class Heap
	attr_accessor :heap

	def initialize()
		@heap = []
	end

	# Creates a max-heap
	def heapify(data)			
		data.each do |element|
			@heap.push element
			up_heap(@heap.length - 1)						
		end
	end
	
	def remove()
		@heap.delete_at(0)
		
		if not empty?
			last_element = @heap.pop
			@heap.insert(0, last_element)
			down_heap(0)
		end
	end
	
	def empty?()
		return false if @heap.length > 0
		return true
	end
	
private
	# method to restore heap property after adding an element to a heap
	def up_heap(index)
		if parent_exists?(index)
			parent_index = ((index - 1) / 2).floor
			
			if @heap[parent_index] < @heap[index]			
				swap(index, parent_index)
				up_heap(parent_index)
			end		
		end		
	end
	
	# method to restore heap property after removing an element from the heap
	def down_heap(index)
		if smaller_than_children?(index)
			index_of_greatest_child = index_of_greatest_child?(index)
			swap(index, index_of_greatest_child)
			down_heap(index_of_greatest_child)
		end
	end
 
	def swap(first_index, second_index)
		@heap[first_index], @heap[second_index] = @heap[second_index], @heap[first_index]
	end

	def parent_exists?(index)
		return index != 0		
	end
	
	def smaller_than_children?(index)			
		if @heap.length < 2 * index + 2
			# element doesn't have any children
			return false
		elsif @heap.length == 2 * index + 2 
			# element only has one child
			return @heap[index] < @heap[2 * index + 1] 
		else 
			# element has two children			
			return @heap[index] < @heap[2 * index + 1] || @heap[index] < @heap[2 * index + 2] 
		end
	end
	
	# This method assumes there is at least one child for the specified index.
	def index_of_greatest_child?(index)	
		# We only have one child, it is thus the greatest
		return 2 * index + 1 if @heap.length == 2 * index + 2 
		
		# We have two children
		if @heap[2 * index + 1] > @heap[2 * index + 2]
			return 2 * index + 1
		else
			return 2 * index + 2
		end
	end
end

The heap class contains the array which is accessible through an accessor.

The heapify method will take an unsorted array and will store it as a heap in the array. The implementation creates a max-heap, meaning a heap where the root node is the greatest element in the heap. The algorithm could also be created with a min-heap.

Remove, on the other hand will remove the first element on a heap (to be used by the sorting algorithm) and then recreate the heap using the recursive down_heap method.

Now for the sort algorithm and it’s test…

For the tests, I decided to create a reusable SortTest class. Heapsort was more complex to implement than my earlier sorts. Because of this, I decided to have a much more rigorous test suite. Since I’m feeling confident I will be able to use the same test cases for quicksort, I decided to put them away in a reusable SortTest class.

require 'test/unit'
require './heap_sort.rb'
require './sort_test.rb'

class HeapSortTest < Test::Unit::TestCase	
	def setup
		heap_sort = HeapSort.new
		@sort_test = SortTest.new(heap_sort)
	end
	
	def test_empty_array
		assert @sort_test.test_empty_array, "Empty array not corectly sorted"
	end
	
	def test_array_with_one_element
		assert @sort_test.test_array_with_one_element, "Array with one element not correctly sorted"
	end
	
	def test_ordered_array
		assert @sort_test.test_ordered_array, "Ordered array not correctly sorted"
	end
	
	def test_array_with_even_number_of_elements
		assert @sort_test.test_array_with_even_number_of_elements, "Array with even number of elements not correctly sorted"
	end
	
	def test_array_with_odd_number_of_elements
		assert @sort_test.test_array_with_odd_number_of_elements, "Array with odd number of elements not correctly sorted"
	end
	
	def test_medium_sized_array
		assert @sort_test.test_medium_sized_array, "Medium sized array not correctly sorted"
	end
	
	def test_reversed_sorted_array
		assert @sort_test.test_reversed_sorted_array, "Reversed sorted array not correctly sorted"
	end
	
	def test_large_size_array
		assert @sort_test.test_large_sized_array, "Large sized array not correctly sorted"
	end
	
	def test_test_array_with_repeating_elements
		assert @sort_test.test_array_with_repeating_elements, "Array with repeating elements not correctly sorted"
	end
end
require './heap.rb'

class HeapSort
	def sort(to_sort)
		heap_to_sort = Heap.new
		heap_to_sort.heapify(to_sort)
		
		sorted_array = []
		
		until heap_to_sort.empty?
			sorted_array.insert(0, heap_to_sort.heap.first)
			heap_to_sort.remove()
		end
	
		return sorted_array
	end
end

As you can see, most of the logic is in the heap class. The out of place sort algorithm is dead simple. Since the root node of a max-heap will always be it’s greatest element, we iterate over the heap taking the largest element and placing it in our new array.

Then we just have to remove the first element and make sure we reconstruct the heap correctly. After which we start over again.

That is all for now. See you next time for quicksort.

Sorting algorithms in Ruby, Part 1
Sorting algorithms in Ruby, interlude
Sorting algorithms in Ruby, Part 3

2 thoughts on “Sorting algorithms in Ruby, Part 2

  1. Hi,

    Thanks for the detailed explaination and the lucid, self explanatory code.It has helped me a lot while I was revising all the sorting algorithms

    When I tried to dry run the object based HeapSort program above, it struck to me that
    a “last_inserted _node” is always going to have its left and right child nil. And since you check this for the left_child node first
    – “if @last_inserted_node.left_child.nil? ”
    its going to be true always . So , this function never adds a node to the right side. It keeps on adding nodes to the left subtree, thus making it more of a linked list than a tree/heap

    Thanks
    A.

    1. Hi,

      First of all thank you for your great positive comments. Secondly thanks for pointing out my mistake, I am always glad for a chance to improve.

      I will correct the code and post a new version shortly.

      Thanks 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s