Jordan Lewis

Exercises From PFDS Section 2.2

Section 2.2’s exercises define some optimizations to the section’s unbalanced tree set implementation.

Exercise 2.2

The implementation of member that Okasaki gives for binary search trees in section 2.2 performs 2d comparisons for a tree of depth d in the worst case, when searching for a number that is the farthest to the right on the tree, and when the right path in the tree is of depth d itself. This is because every call of member checks whether x < y and, if not, whether y < x. This strategy allows immediate detection of the case in which the value being searched for is contained in the current node, permitting short-circuit cases like where the searched-for value is in the root node, at the cost of requiring double the comparisons when traversing rightward.

Exercise 2.2 proposes a strategy in which the second comparison is delayed until the bottom of the tree. This removes the 2d worst-case number of comparisons, at the cost of raising the minimum number of comparisons to the number of nodes in the shortest path from the root to a leaf. The implementation is relatively straightforward:

We introduce a new parameter to member, c: Option[T], which is None when there is not yet a candidate for equality, and Some(d) when d is a candidate for equality. Then, when we reach a leaf node, we check to see whether our candidate is equal to the value we’re searching for.

Exercise 2.3

Exercise 2.3 proposes an optimization to insert. The implementation given by Okasaki performs wasteful path copying when the value being inserted is already in the tree. If the value being inserted is present in a particular node, the only thing that’s done is to short-circuit the operation and return the node as it exists, which doesn’t reverse any of the path copying performed while traversing the tree down to the node.

Throwing an exception in the case where the element being inserted already exists avoids the wasteful path copying. Okasaki requires that the function establish only one exception handler per insert, not per function call, so we catch the exception in the helper insert method of the UnbalancedTreeSet class.

Exercise 2.4

Exercise 2.4 is to integrate the improvements to member from exercise 2.2 into the improved version of insert from 2.3. This is straightforward: we follow the pattern established by 2.2, threading an equality candidate through the recursive calls, and checking for equality only at the end. Now insert performs no unnecessary path copying if inserting a pre-existing element, and performs at most d + 1 comparisons along the way.

Exercise 2.5

Exercise 2.5 is about generating balanced binary trees that share as much as possible. The exercise assumes that all of the values in the trees are the same, which doesn’t make much sense for the set abstraction, but ensures that all subtrees of the same size are identical.

For the first part, we implement a function complete that creates a complete tree of d levels. As a binary tree, it should have 2^d - 1 values inside. The implementation is a straightforward recursive function. The base case, a binary tree of 0 levels, is just a leaf node. The recursive case creates an internal node whose children are both the result of recursing with one less level.

For the second part, we implement a function balanced that creates a tree with d values that is as balanced as possible: every internal node’s subtrees differ in size by at most 1.

I put both of these methods directly on BinaryTree’s companion object. I think this is idiomatic, but it’s not really clear to me: there are so many different ways to do things in Scala.

I wrote a few tests for the tree generation functions in a normal assertion-based style before remembering the existence of ScalaCheck and its obvious applicability for this kind of thing. I’ve been itching to use a QuickCheck-style testing framework for a while. If you haven’t seen it before, you should definitely check it out. It allows you to write tests in a declarative style, without specifying individual test cases. The framework will generate random test cases for you, either by using built-in random generators for simple cases like arbitrary strings or integers, or by using a user-defined random object generator.

In this case, since I wanted my test cases to vary the number of levels or nodes in a balanced tree, I was able to just use the built-in random integer generator. Writing tests in this way is concise and fun. Here’s what they ended up looking like:

Exercise 2.6

This exercise asks the reader to modify UnbalancedTreeSet so that it represents a map instead of a set. This is very straightforward, if tedious: we must add another data member to BinaryTreeNode that represents the value of an entry in the map, change member to lookup and have it return the value data member instead of true, and throw an exception instead of false, and finally change insert to bind, give it an extra value argument, and set the BinaryTreeNode’s value field to that argument. I didn’t implement this.

As usual, the new code’s on GitHub.

PFDS Section 2.2

Section 2.2 presents immutable sets implemented with unbalanced binary search trees, a slightly more complex example of immutable data sharing than the list example in Section 2.1. My first challenge was to reimplement Okasaki’s base implementation of unbalanced binary search tree sets using idiomatic Scala. I had to learn a fair amount more about Scala’s type system to be able to write such an implementation, so I figured I’d write up some of the things I learned about Scala in the process as well as the implementation.

A generic trait for sets

Okasaki’s set interface contains two methods, insert and member. Similar to the generic stack trait implementation that I wrote about last time , it’s easy to create this interface as a generic trait in Scala.

Unbalanced binary search trees

Unbalanced binary search trees are a great choice for a simple implementation of Okasaki’s set interface, since insert and member in an unbalanced binary search tree both take on average O(log n) time in the number of elements in the tree, or O(n) time in the pathological case where each subsequent insert is greater than the last.

I decided to follow Okasaki’s functor-like pattern for the implementation, so I wrote a simple binary tree data type using case classes. This allowed me to use Scala’s pattern matching feature in my implementation of the binary search tree.

Binary search trees rely on the ordering of the type of element that they’re storing, so a generic binary search tree implementation similarly must require that its type parameter has an order. In Scala, this can be accomplished with either of two methods: either by using a view bound on the type parameter to require that it be implicitly convertable to an Ordered type, or by requiring an instance of Ordering, parameterized on the binary tree’s input type, as a value parameter.

In trying to write the most Scala-idiomatic version of the data structure, I ended up investigating both methods before deciding on one, since it wasn’t obvious at first which method would be the simplest. If you’re already familiar with Scala view bounds and type orderings, or if you just want to get to the implementation of the 2.2 data structures already, then you should skip right ahead to the implementation.

View bounds

A view bound is specified by the <% operator. Specifying T <% U in a type parameter adds the restriction that the type parameter T must be implicitly convertible to U, which is true when there exists a method

implicit def t2u(t: T): U

somewhere in the current scope. In our case, we’re interested in view bounding T by Ordered[T], which will give us access to the comparison methods (<, >, etc) defined on the Ordered[T] type for values of type T.

For example, to implement a generic less-than function lt that utilizes the < method on the input type’s Ordered implementation, we write

def lt[T <% Ordered[T]](x: T, y: T): Boolean = x < y

Without specifying the view bound on T, we would not have access to the < method on x, since it is not defined for any arbitrary type T.

Ordering instances

The other Scala-idiomatic way to provide or access the ordering of a type T is through objects that implement the trait Ordering[T]. An object that implements Ordering[T] provides the compare(x: T, y: T) method, which acts as the underlying implementation for the trait’s other methods, which include lt(x: T, y: T), gt(x: T, y: T), and the like.

For example, to implement a generic less-than function lt that utilizes the lt method on the input type’s Ordering implementation, we could write the curried method

def lt[T](x: T, y: T)(implicit val ordering: Ordering[T]): Boolean = ordering.lt(x, y)

Note that we don’t have to specify a type bound on T, but we still ensure that T is comparable at the type system level by requiring as an argument an ordering object that’s parameterized on T. We make the ordering an implicit and curried argument, so that if T has an implicit conversion to Ordering[T], as most of Scala’s comparable types do, the user doesn’t have to explicitly pass in the ordering.

Using instances of Ordering[T] makes your code slightly less pretty, as you can no longer write x < y, you have to instead write ordering.lt(x, y), but it makes more explicit what’s going on under the hood. It wouldn’t necessarily be obvious that writing x < y where x and y are instances of type T <% Ordered[T] actually invokes the < method on whatever Ordered[T] object x and y are implicitly convertible to. I think I prefer the more explicit version using Ordering[T], but that’s probably because using implicit conversions at all leaves me with a funny taste in my mouth. It seems to me that overusing implicit conversions would lead to a special hell of confusing spaghetti code.

Scala implementation of unbalanced binary tree sets

For the implementation, I wrote the actual insert and member logic as private methods on a utility object. The class that actually implements the Set trait uses the utility object as its underlying implementation. I used Scala’s companion object idiom to structure the whole thing in an elegant way.

Here’s the companion object that contains the underlying implementation logic:

And here’s the class that actually implements the Set interface using the companion object:

You can see the code in its entirety, as well as a small unit test, on GitHub. Next time I’ll implement solutions for section 2.2’s exercises.

Abstract Generic Collections: PFDS Section 2.1 Redux

At the end of my last post, I mentioned that I ended up reusing Scala’s build-in List collection to implement the exercises instead of writing a generic abstract Stack and sample implementations of those. Since then, I’ve spent some time learning about how to implement generic collections in Scala. I came up with a Stack trait, a la Okasaki’s Stack signature, and three implementations: the first two are straightforward translations of the SML structures given in the book, and the third is a more Scala-idiomatic implementation.

On an organizational note, I’ve also started a PFDScala repository on GitHub for this blog series, so I can have a centralized place to put all of the code I write for the exercises and examples. I’ll still put the relevant snippets in gists so I can embed them here.

A Generic Trait for Stacks

Scala traits are very powerful. They’re basically like mixin classes from Ruby, in that your class can extend from multiple traits without having the troubles that plague C++-style multiple inheritance.

Abstract traits can be used like Java interfaces. This is just what we need to write a generic interface for Stacks:

For those of you unfamiliar with Scala, this is similar to a Java interface or a SML signature: we’re defining a type-parameterized trait/interface/signature with a bunch of method signatures that classes which extend this trait must implement themselves. What’s up with the weird type annotations, though?

In Scala, annotating a type parameter with “+” marks it as a covariant type parameter. This means that if we have a type C[+T], a type with a single covariant type parameter, and if type A is a subtype of type B, then type C[A] is a subtype of C[B]. We can similarly annotate a type parameter with “-” for the opposite effect.

The “>:” type operator is the supertype relationship: the type on the left of it has to be a supertype of the type on the right of it.

So why do we need to annotate our types like this here? In Scala, by default, types are invariant in their type parameters. This means if we had a List[Integer], and List’s type parameter was invariant, then that list would not be a subtype of a List[Number], even though it seems like it would be fine because a List of Integers is just a specialized List of Numbers. The same thing goes for our Stack type, so we mark its type parameter up as covariant.

However, to keep compile-time type checking sound, Scala has to impose some restrictions on the types of methods in classes with covariant type parameters, due to the problem of mutability: a mutable array of type T is actually contravariant in its type parameter, because if you had an Array of type Any, updating one of its cells to be a subtype of Any like String is not always legal. So, in our Stack example, we can no longer simply say

def cons(x: T) : Stack[T]

because the x is appearing in a contravariant position, meaning that it has the potential to mutate state in a way that could break type safety.

Luckily, we can get around this problem of contraviariant position by imposing a bound on the type of cons’s parameter: we say

def cons(x: U >: T) : Stack[U]

to ensure that the input type of cons is a supertype of the stack’s type. This prevents any type safety issues caused by the potential contravariance of the formal parameter, and allows users to generalize the type of a Stack by consing a more general type onto it. For example, one could cons a String onto a Stack[Int] and get out a Stack[Any].

Implementation 1: Using builtin Lists as the backing store

This class uses the builtin List implementation to provide the underlying storage for the Stack. It also uses Scala’s companion object feature to provide a static factory method for creating new ListStacks.

Implementation 2: Using a custom List case class as the backing store

This class uses a custom implementation of List as the backing store. It’s the same idea as the first one, except we use a set of case classes to match on instead of just wrapping List’s functions.

This shows off Scala’s case classes a little bit: they’re just like datatypes in SML, except a bit more verbose to specify. The abstract class LIST is like the name of the datatype, and the case classes that inherit from it are like the datatype’s constructors. Making LIST sealed restricts classes from extending it unless they’re in the same file: this allows the compiler to detect non-exhaustive matches.

Implementation 3: Implementing the Stack functions within case classes

This implementation is a bit different from the other two: instead of storing the actual Stack data as a data member inside of a single StackImpl class, this puts the implementation inside of case classes that extend the implementation class, which is made abstract.

This strategy seems the most idiomatic of the implementations I wrote. It defines its own internal datatype like the custom List implementation, without the mess of having to match on each of the cases of the custom List every time it’s necessary to operate on the data. I think it produces the most compact and readable code out of all three implementations.

Conclusion

That was a long-winded post for what was just defining a simple custom collection interface and a few simple implementations. Again, all of this code is available in the PFDScala GitHub repo. Next time we’ll proceed with the next section, 2.2. Hopefully it will be easier to implement the examples and exercise solutions with the improved understanding of Scala’s type system that I gathered by writing this post.

PFDS Section 2.1

This is the inaugural post of the PFDS series.

Section 2.1 discusses the ramifications of implementing lists and stacks in a functional and immutable manner. Using the operation of list catenation as a motivator, Okasaki introduces the idea of data sharing. We see that to catenate two lists, we can share the second list, which doesn’t get modified, but must copy all of the nodes in the first list just to modify the last one.

In Scala, the function catenate on the built-in list type looks like the following:

Updating a single element of the list is similar. We have to copy all of the elements in the list up to the element to be updated, and then we can point the tail of the element we updated to the pre-existing tail of the old element, thus sharing as much data as possible.

Exercise 2.1

This exercise is straightforward: we must write a function that takes a generic list and returns a list of all of the suffix lists of the input list, from longest to shortest. We must show that this function operates in linear time and linear space with respect to the size of the input list.

Since no elements are being updated, it’s easy to see that all we have to do is return a new list whose elements are every cons cell in the input list. This is O(n) in time, as we are performing one cons operation for each element in the input list, and O(n) in space, as we’re saving one cons cell per cons cell in the input list.

This was a pretty simple section, serving mainly as a refresher course in functional programming fundamentals.

For this post, I reused Scala’s built-in List type to implement the exercise and example functions. I had intended to define my own abstract generic Stack, and show how it can be implemented with either the build-in List type or a set of case classes Nil and Cons, like Okasaki does using Standard ML. However, I’m still a Scala novice, and I ran into some difficulties with the type system that go over my head at this point. I plan to revisit this at a later date once I’ve learned a little bit more about Scala’s type system.

Notes on Purely Functional Data Structures

I heard a lot of good things about Mike Okasaki’s Purely Functional Data Structures at UChicago, but didn’t ever take the time to check it out. Lately I’ve missed the heady joy of reading and writing code in a strongly typed functional programming language like Standard ML, so when one of my coworkers at Knewton mentioned he was going to read the book I decided to get a copy for myself.

I’m going to try to read through the whole book and complete as many of the exercises that I can. To help myself keep the commitment, I’m going to follow in Eli Bendersky’s footsteps and post reading notes and exercise solutions along the way, as he did for SICP.

The notes will be categorized under pfds.

Also, I’ve recently begun to learn Scala, a strongly typed functional language on the JVM with nice features such as algebraic datatypes in the form of case classes, pattern matching, and lazy values. Given the usefulness of these language amenities for exploration of Okasaki’s concepts, I’m going to do the exercises in Scala instead of Standard ML or Haskell.

Hello World

Hi internet! I’ve gotten with the program and bloggified my website with the help of the pretty rad Octopress framework. Hope you enjoy it.