PFDS Section 2.2

Posted on May 1, 2012

Section 2.2 presents immutable sets implemented with unbalanced binary search trees, a slightly more complex example of immutable data sharing than the list example in Section 2.1. My first challenge was to reimplement Okasaki’s base implementation of unbalanced binary search tree sets using idiomatic Scala. I had to learn a fair amount more about Scala’s type system to be able to write such an implementation, so I figured I’d write up some of the things I learned about Scala in the process as well as the implementation.

A generic trait for sets

Okasaki’s set interface contains two methods, insert and member. Similar to the generic stack trait implementation that I wrote about last time , it’s easy to create this interface as a generic trait in Scala.

Unbalanced binary search trees

Unbalanced binary search trees are a great choice for a simple implementation of Okasaki’s set interface, since insert and member in an unbalanced binary search tree both take on average O(log n) time in the number of elements in the tree, or O(n) time in the pathological case where each subsequent insert is greater than the last.

I decided to follow Okasaki’s functor-like pattern for the implementation, so I wrote a simple binary tree data type using case classes. This allowed me to use Scala’s pattern matching feature in my implementation of the binary search tree.

Binary search trees rely on the ordering of the type of element that they’re storing, so a generic binary search tree implementation similarly must require that its type parameter has an order. In Scala, this can be accomplished with either of two methods: either by using a view bound on the type parameter to require that it be implicitly convertable to an Ordered type, or by requiring an instance of Ordering, parameterized on the binary tree’s input type, as a value parameter.

In trying to write the most Scala-idiomatic version of the data structure, I ended up investigating both methods before deciding on one, since it wasn’t obvious at first which method would be the simplest. If you’re already familiar with Scala view bounds and type orderings, or if you just want to get to the implementation of the 2.2 data structures already, then you should skip right ahead to the implementation.

View bounds

A view bound is specified by the <% operator. Specifying T <% U in a type parameter adds the restriction that the type parameter T must be implicitly convertible to U, which is true when there exists a method implicit def t2u(t: T): U somewhere in the current scope. In our case, we’re interested in view bounding T by Ordered[T], which will give us access to the comparison methods (<, >, etc) defined on the Ordered[T] type for values of type T.

For example, to implement a generic less-than function lt that utilizes the < method on the input type’s Ordered implementation, we write

def lt[T <% Ordered[T]](x: T, y: T): Boolean = x < y

Without specifying the view bound on T, we would not have access to the < method on x, since it is not defined for any arbitrary type T.

Ordering instances

The other Scala-idiomatic way to provide or access the ordering of a type T is through objects that implement the trait Ordering[T]. An object that implements Ordering[T] provides the compare(x: T, y: T) method, which acts as the underlying implementation for the trait’s other methods, which include lt(x: T, y: T), gt(x: T, y: T), and the like.

For example, to implement a generic less-than function lt that utilizes the lt method on the input type’s Ordering implementation, we could write the curried method

def lt[T](x: T, y: T)(implicit val ordering: Ordering[T]): Boolean = ordering.lt(x, y)

Note that we don’t have to specify a type bound on T, but we still ensure that T is comparable at the type system level by requiring as an argument an ordering object that’s parameterized on T. We make the ordering an implicit and curried argument, so that if T has an implicit conversion to Ordering[T], as most of Scala’s comparable types do, the user doesn’t have to explicitly pass in the ordering.

Using instances of Ordering[T] makes your code slightly less pretty, as you can no longer write x < y, you have to instead write ordering.lt(x, y), but it makes more explicit what’s going on under the hood. It wouldn’t necessarily be obvious that writing x < y where x and y are instances of type T <% Ordered[T] actually invokes the < method on whatever Ordered[T] object x and y are implicitly convertible to. I think I prefer the more explicit version using Ordering[T], but that’s probably because using implicit conversions at all leaves me with a funny taste in my mouth. It seems to me that overusing implicit conversions would lead to a special hell of confusing spaghetti code.

Scala implementation of unbalanced binary tree sets

For the implementation, I wrote the actual insert and member logic as private methods on a utility object. The class that actually implements the Set trait uses the utility object as its underlying implementation. I used Scala’s companion object idiom to structure the whole thing in an elegant way.

Here’s the companion object that contains the underlying implementation logic:

And here’s the class that actually implements the Set interface using the companion object:

You can see the code in its entirety, as well as a small unit test, on GitHub. Next time I’ll implement solutions for section 2.2’s exercises.