[Tutorial] Implementing a basic type system

Joe

In this post, I want to play with our programming languages and explore how can we implement a minimal type system. In order to do so I will build the system on top of a dynamic language so that the ideas are more easily exposed; for fun and practice, I chose Lua, but any other language can do really (provided it supports higher order functions). Don't worry if you cannot run Lua on your machine, the point is not to give you a useful executable, but code to read and understand the concept. While I assume a minimal level of programming knowledge, you might need to take a few moments to understand how Lua's tables work. Other than that, the code is easy to read, the type system we develop is very basic and works pretty much the way you expect it to.

The problem: Dealing with conflicting representations
Instead of taking the dangerously naive approach of trying to define what a "type system" really means (you can always go check Wikipedia), I will start by exposing a fairly common problem and build a solution from there. How do you deal with conflicting representations of the same data? Imagine you're given the task of merging two (or several) complex systems which offer similar functionality but represent data internally differently. It could be two forum websites representing topics and posts differently or ERP systems using different representations for items? In order to keep this as simple as possible, we will study a textbook math case: Complex numbers.

Complex numbers come in 2 forms
Most of us learned in high school that there are two obvious ways of representing complex numbers:

rectangular form
polar form

If you're not familiar with complex numbers at all, I suggest you take a few moments to read about them (the Wikipedia article I linked to is a start, for more Google is your friend). In any case here's a quick reminder of what you need to know:

Complex numbers are expressed in terms of the imaginary number i, such as i**2 == -1. (Physicist refer to this number as j).

The first form of representing these numbers consists of define a Real Part and an Imaginary Part. This is the "rectangular form". There is another part to represent them which has a lot to do with the way we draw these numbers on a plane and it is by defining a Magnitude and an Angle. Conversion from one form to the other can be done using the following formulas.

Let z be a complex number:

Magnitude(z) == sqrt(RealPart(z)**2 + ImaginaryPart(z)**2)
Angle(z) == arctan(ImaginaryPart(z) / RealPart(z))
RealPart(z) == Magnitude(z) * cos(Angle(z))
ImagPart(z) == Magnitude(z) * sin(Angle(z))

Let's turn this into code
Here's how we're going to implement this. Each representation will define a constructor, a 4 selectors: realPart, imagPart, magnitude, angle. The functions are written according to the formulas defined in the above paragraph.

Rectangular form

  function makeComplex (real, imag) 
     return {realpart=real, imagpart=imag}
  end
  
  function realPart (z)
     return z.realpart
  end
  
  function imagPart (z)
     return z.imagpart
  end
  
  function magnitude(z)
     return math.sqrt((z.realpart ^ 2) + (z.imagpart ^ 2))
  end
  
  function angle(z)
   -- # for simplicity's sake, we're ignoring the case where z.realpart == 0
   -- # which is the case for pure imaginary numbers
     return math.atan(z.imagpart / z.realpart)
  end

Polar form

  function makeComplex (mag, ang)
     return {magnitude=mag, angle=ang}
  end
  
  function realPart (z)
     return z.magnitude * math.cos(z.angle)
  end
  
  function imagPart (z)
     return z.magnitude * math.sin(z.angle)
  end
  
  function magnitude (z)
     return z.magnitude
  end
  
  function angle (z)
     return z.angle
  end

Introducting generic operations
While it could be argued that one representation is /better/ than the other, the reality of the matter might force us to have these 2 representations coexisting in the same system. A common reason would be the merging of two legacy systems into one, for example. How can we use each of the 4 selectors on an object without knowing which representation it's using beforehand?

The simplest solution consists of attaching a "label" (more appropriately called "type") to the object. This can be done in 3 steps:

Modify the constructor to attach a type to the objects.
Modify the selectors' names to indicate on which type of data they operate.
Write generic operators that would call the appropriate selector according to the type.

This is exactly what we're going to do:

Generic operators

  function realPart(z)
     if z.type == "rectangular" then
        return realPartRectangular(z)
     end
  
     else if z.type == "polar" then
           return realPartPolar(z)
          end
        else
           error ()
           end
  end
  
  function imagPart(z)
     if z.type == "rectangular" then
        return imagPartRectangular(z)
     end
  
     else if z.type == "polar" then
           return imagPartPolar(z)
          end
        else
           error ()
           end
  end
  
  function magnitude(z)
     if z.type == "rectangular" then
        return magnitudeRectangular(z)
     end
  
     else if z.type == "polar" then
           return magnitudePolar(z)
          end
        else
           error ()
           end
  end
  
  function angle(z)
     if z.type == "rectangular" then
        return angleRectangular(z)
     end
  
     else if z.type == "polar" then
           return anglePolar(z)
          end
        else
           error ()
           end
  end

New Rectangular Form

  function makeComplexRectangular (real, imag) 
     return {realpart=real, imagpart=imag, type="rectangular"}
  end
  
  function realPartRectangular (z)
     return z.realpart
  end
  
  function imagPartRectangular (z)
     return z.imagpart
  end
  
  function magnitudeRectangular (z)
     return math.sqrt((z.realpart ^ 2) + (z.imagpart ^ 2))
  end
  
  function angleRectangular (z)
     -- # for simplicity's sake, we're ignoring the case where z.realpart == 0
     -- # which is the case for pure imaginary numbers
     return math.atan(z.imagpart / z.realpart)
  end

New Polar form

  function makeComplexPolar (mag, ang)
     return {magnitude=mag, angle=ang}
  end
  
  function realPartPolar (z)
     return z.magnitude * math.cos(z.angle)
  end
  
  function imagPartPolar (z)
     return z.magnitude * math.sin(z.angle)
  end
  
  function magnitudePolar (z)
     return z.magnitude
  end
  
  function anglePolar (z)
     return z.angle
  end

Joe

Improving the solution
The above solution effectively solves our problem. The objects' user (as opposed to the designer) has now access to generic selectors without caring which representation is used internally to hold complex numbers.

However this solutions is far from perfect. It lacks flexibility as it introduces a lot of extra work in order to add a 3rd (hypothetical) representation. To add a new form, we now have to modify each of the generic operations to add extra checks. This introduces accidental complexity we can (and should) easily do without. In order to improve the system, we need to give it the capacity to determine by itself the correct operations to call without having to specify them explicitly.

There are several ways to do this, and I will present the easier one. We will delegate the task of chosing the appropriate selector to another function (we'll call it operate). We will also create a 2d array that holds the needed information. Something like this:

|             | realPart            | imagPart            | magnitude            | angle            |
|-------------+---------------------+---------------------+----------------------+------------------|
| rectangular | realPartRectangular | imagPartRectangular | magnitudeRectangular | angleRectangular |
| polar       | realPartPolar       | imagPartPolar       | magnitudePolar       | anglePolar       |
|-------------+---------------------+---------------------+----------------------+------------------|

I'll leave it as an exercise to the reader to create her own 2d array and create getters and setters, as this falls a little bit outside the scope of this article. The important thing here is to be able to pass functions as parameters to the accessors and hold it as objects inside the matrix.

We now define the operate function which chooses wisely which element of the matrix to get:

  function operate (operation object)
     proc = getFromMatrix(operation, object.type)
  
     -- # Checking for the existence of the desired selector
     if proc then
        return proc(object)
     else
        error()
     end
  end

Again, this function is simply looking up the matrix under the target column and row and returning the function object it finds.

The only thing remaining is to define our 4 selectors in terms of the operate function:

  function realPart(z)
     return operate("realPart", z)
  end
  
  function imagPart(z)
     return operate("imagPart", z)
  end
  
  function magnitude(z)
     return operate("magnitude", z)
  end
  
  function angle(z)
     return operate("angle", z)
  end

Voila! Now we have a dynamic system that can be easily extended to accept more forms. As a matter of facts, adding a new representation for complex numbers is as simple as adding a new row to our matrix. The effects will propagate correctly across the system without any extra efforts.

Parting words
I do not claim any authority over the above material. It was simply something I went through in my study of SICP. If you enjoyed this tutorial I strongly suggest you check out the book and the video lectures. In particular, if you want to continue this particular chapter, you should watch this video. You will see how powerful that operate function really is and how extensible it makes the system, by building an algebraic manipulation package. Honestly, highly recommended.

In the meantime, please use this topic to post any question/feedback/correction/comment you can think off. I really hope it helps :)