We thought that phantom types would be an appropriate topic for our first real post because they are a good example of a powerful and useful feature of OCaml that is little used in practice.
In this post, I'll cover a fairly simple use of phantom types:
enforcing a capability-style access-control policy. In particular,
I'll describe how you can create easy to use read-only handles to a
mutable data structure. We'll explore this using the example of an
int ref. The int ref is a toy example, but the same approach can be
used for more realistic cases, such as a string library or a database
interface.
We'll start by implementing an int ref module on top of OCaml's
built-in ref.
module Ref : sig type t val create : int -> t val set : t -> int -> unit val get : t -> int end = struct type t = int ref let create x = ref x let set t x = t := x let get t = !t end
The simplest way of getting a read-only handle is to create another module with a different, more constrained signature.
module RORef : sig type t val import : Ref.t -> t val get : t-> int end = struct type t = Ref.t let import x = x let get = Ref.get end
An RORef.t is just a Ref.t underneath, but the signature hides
that fact by making the RORef.t abstract. Note that there is a
function for converting Ref.t's to RORef.t's (import), but not
the other way around. This gives you a way to create the read-only
handle, but prevents someone with such a handle from recovering the
underlying read-write handle. The downside to this approach is that it
is impossible to write code that is polymorphic over Ref.t's and
RORef.t's, even if that code only uses the functionality common to
both, i.e., if it only reads.
A better solution is to use a phantom type to encode the access
control rights associated with a particular value. But what is a
phantom type? The definition unfortunately makes it sound more
complicated than it is. A phantom type is a type that is used as a
parameter to another type (like the int in int list), but which is
unused in the actual definition (as in type 'a t = int). The fact
that the phantom parameter is unused gives you the freedom to use it
to encode additional information about your types, which you can then
convince the type checker to keep track of for you. Since the phantom
type isn't really part of the definition of the type, it has no
effect on code-generation and so is completely free at runtime. The
way in which you convince the type-checker to track the information
you're interested in is by constraining the appearance of the phantom
types using a signature.
It's easier to understand once you look at an example.
type readonly type readwrite module PRef : sig type 'a t val create : int -> readwrite t val set : readwrite t -> int -> unit val get : 'a t -> int val readonly : 'a t -> readonly t end = struct type 'a t = Ref.t let create = Ref.create let set = Ref.set let get = Ref.get let readonly x = x end
In the above code, the phantom type tells you what your permissions
are. A readwrite PRef.t can read and write, and a readonly PRef.t
can only read. Note that the get function doesn't pay any attention
to the phantom type, which is why get can be used with both
readwrite and readonly PRef.t's. The only function that can
modify a ref is get, and that requires a readwrite PRef.t.
Note that the types readonly and readwrite have no definitions.
They look like the declaration of an abstract type, except that these
definitions are not in a signature. They're actually examples of
uninhabited types, i.e., types without associated values. The
lack of values presents no problems here, since we're using the types
only as tags.
The great thing about this approach is how seamlessly it works in practice. The user of the library can write things in a natural style, and the type system propagates the access-control constraints as you would expect. For example, the following definitions
let sumrefs reflist = List.fold_left (+) 0 (List.map PRef.get reflist) let increfs reflist = List.iter (fun r -> PRef.set r (PRef.get r + 1)) reflist
will be given the following inferred types
val sumrefs : 'a PRef.t list -> int val increfs : readwrite PRef.t list -> unit
In other words, the first function, which only reads, can operate on
any kind of ref, and the second, which mutates the refs, requires a
readwrite ref.
There is one problem with the access control policy we implemented
above, which is that there is no clean way of guaranteeing that a
given value is immutable. In particular, even if a given value is
readonly, it doesn't preclude the existence of another readwrite
handle to the same object somewhere else in the program. (Obviously,
immutable int refs are not a particularly compelling application, but
having both mutable and immutable versions makes sense for more
complicated data types, such as string or arrays).
But we can get immutable values as well by making the phantom types just slightly more complicated.
type readonly type readwrite type immutable module IRef : sig type 'a t val create_immutable : int -> immutable t val create_readwrite : int -> readwrite t val readonly : 'a t -> readonly t val set : readwrite t -> int -> unit val get : 'a t -> int end = struct type 'a t = Ref.t let create_immutable = Ref.create let create_readwrite = Ref.create let readonly x = x let set = Ref.set let get = Ref.get end
Importantly, there's no way for an IRef.t to become immutable. It
must be immutable from birth.
One thing that's notable about the IRef signature is that there is
no way of creating an actual polymorphic IRef.t. The two creation
functions both create values with specific tags, immutable or
readwrite. These specialized create functions aren't strictly
necessary, though. We could have instead written IRef with the
following signature.
sig type 'a t val create : int -> 'a t val set : readwrite t -> int -> unit val get : 'a t -> int val readonly : 'a t -> readonly t end
The user can force the creation of an immutable or readwrite Ref
by adding a constraint. So, you could get the effect of
let r = IRef.create_immutable 3
by instead writing
let r = (IRef.create 3 : immutable IRef.t)
The advantage of the polymorphic create function is straightforward: it allows you to write functions that are more polymorphic, and therefore more flexible. For instance, you could write a single function that could create, depending on context, an array of readwrite refs, an array of readonly refs, or an array of immutable refs.
The downside is that it may require more type annotations when you do
want to be explicit about the permissions, and it also allows some
weird types to come up. In particular, you can create an IRef.t with
any phantom parameter you want! Nothing stops you from creating a
string IRef.t, even though string doesn't make any sense as an
access-control tag. Interestingly, the signature doesn't actually
make any reference to the immutable type, and in fact, using any
phantom parameter other than readonly and readwrite makes the ref
immutable. The access control restrictions still work in roughly the
way you would expect, but it is still a bit harder to think about than
the original signature.
Comments
A possible pitfall
First of all, great to see your blog guys! I took me a while to notice it, but better late than never. :)
I just wanted to point out something that might be a potential pitfall for people using phantom types in OCaml, especially for beginners that might be working on some complicated production code. That exactly happened to me and I was stuck on this for a while.
The following is the piece of code that follows Ron's examples, and on the first glance should be equivalent to the original code.
This shows important distinction between type definitions and type synonyms (or aliases) in OCaml, the distinction that Haskell for example makes explicit by using different keywords for the two.
Type definitions and type synonyms
Right, this is a complexity that I didn't mention but should have. I do think OCaml would be better if it made the distinction explicit. And it's not just Haskell that gets this right; SML does too.
Another work-around
Another way to get your snippet to compile is to add a coercion.
The way I think about it, the
type 'a tcan be either covariant or contravariant in'a, but OCaml sort of doesn't have a way of expressing this in the type -- there's only+'aand-'a, no+-'a. The coercion somehow tells it to look inside the type and infer the variance from the right hand side oftype 'a t = { mutable r : int }. I don't know how closely this matches the theory or implementation of OCaml's type system, but it matches its behavior as far as I know.