HOWTO: Static access control using phantom types

We thought that phantom types would be an appropriate topic for our first real post because they are a good example of a powerful and useful feature of OCaml that is little used in practice.

In this post, I'll cover a fairly simple use of phantom types: enforcing a capability-style access-control policy. In particular, I'll describe how you can create easy to use read-only handles to a mutable data structure. We'll explore this using the example of an int ref. The int ref is a toy example, but the same approach can be used for more realistic cases, such as a string library or a database interface.

We'll start by implementing an int ref module on top of OCaml's built-in ref.

module Ref : sig
  type t
  val create : int -> t
  val set : t -> int -> unit
  val get : t -> int
end
  = 
struct
  type t = int ref
  let create x = ref x
  let set t x = t := x
  let get t = !t
end

The simplest way of getting a read-only handle is to create another module with a different, more constrained signature.

module RORef : sig
  type t
  val import : Ref.t -> t
  val get : t-> int
end
  =
struct
  type t = Ref.t
  let import x = x
  let get = Ref.get
end

An RORef.t is just a Ref.t underneath, but the signature hides that fact by making the RORef.t abstract. Note that there is a function for converting Ref.t's to RORef.t's (import), but not the other way around. This gives you a way to create the read-only handle, but prevents someone with such a handle from recovering the underlying read-write handle. The downside to this approach is that it is impossible to write code that is polymorphic over Ref.t's and RORef.t's, even if that code only uses the functionality common to both, i.e., if it only reads.

A better solution is to use a phantom type to encode the access control rights associated with a particular value. But what is a phantom type? The definition unfortunately makes it sound more complicated than it is. A phantom type is a type that is used as a parameter to another type (like the int in int list), but which is unused in the actual definition (as in type 'a t = int). The fact that the phantom parameter is unused gives you the freedom to use it to encode additional information about your types, which you can then convince the type checker to keep track of for you. Since the phantom type isn't really part of the definition of the type, it has no effect on code-generation and so is completely free at runtime. The way in which you convince the type-checker to track the information you're interested in is by constraining the appearance of the phantom types using a signature.

It's easier to understand once you look at an example.

type readonly
type readwrite
 
module PRef : sig
  type 'a t
  val create : int -> readwrite t
  val set : readwrite t -> int -> unit
  val get : 'a t -> int
  val readonly : 'a t -> readonly t
end
  =
struct
  type 'a t = Ref.t
  let create = Ref.create
  let set = Ref.set
  let get = Ref.get
  let readonly x = x
end

In the above code, the phantom type tells you what your permissions are. A readwrite PRef.t can read and write, and a readonly PRef.t can only read. Note that the get function doesn't pay any attention to the phantom type, which is why get can be used with both readwrite and readonly PRef.t's. The only function that can modify a ref is get, and that requires a readwrite PRef.t.

Note that the types readonly and readwrite have no definitions. They look like the declaration of an abstract type, except that these definitions are not in a signature. They're actually examples of uninhabited types, i.e., types without associated values. The lack of values presents no problems here, since we're using the types only as tags.

The great thing about this approach is how seamlessly it works in practice. The user of the library can write things in a natural style, and the type system propagates the access-control constraints as you would expect. For example, the following definitions

let sumrefs reflist =
  List.fold_left (+) 0 (List.map PRef.get reflist)
 
let increfs reflist =
  List.iter (fun r -> PRef.set r (PRef.get r + 1)) reflist

will be given the following inferred types

val sumrefs : 'a PRef.t list -> int
val increfs : readwrite PRef.t list -> unit

In other words, the first function, which only reads, can operate on any kind of ref, and the second, which mutates the refs, requires a readwrite ref.

There is one problem with the access control policy we implemented above, which is that there is no clean way of guaranteeing that a given value is immutable. In particular, even if a given value is readonly, it doesn't preclude the existence of another readwrite handle to the same object somewhere else in the program. (Obviously, immutable int refs are not a particularly compelling application, but having both mutable and immutable versions makes sense for more complicated data types, such as string or arrays).

But we can get immutable values as well by making the phantom types just slightly more complicated.

type readonly
type readwrite
type immutable
 
module IRef : sig
  type 'a t
  val create_immutable : int -> immutable t
  val create_readwrite : int -> readwrite t
  val readonly : 'a t -> readonly t
  val set : readwrite t -> int -> unit
  val get : 'a t -> int
end
  =
struct
  type 'a t = Ref.t
  let create_immutable = Ref.create
  let create_readwrite = Ref.create
  let readonly x = x
  let set = Ref.set
  let get = Ref.get
end

Importantly, there's no way for an IRef.t to become immutable. It must be immutable from birth.

Extra credit: Making it more polymorphic

One thing that's notable about the IRef signature is that there is no way of creating an actual polymorphic IRef.t. The two creation functions both create values with specific tags, immutable or readwrite. These specialized create functions aren't strictly necessary, though. We could have instead written IRef with the following signature.

sig
  type 'a t
  val create : int -> 'a t
  val set : readwrite t -> int -> unit
  val get : 'a t -> int
  val readonly : 'a t -> readonly t
end

The user can force the creation of an immutable or readwrite Ref by adding a constraint. So, you could get the effect of

let r = IRef.create_immutable 3

by instead writing

let r = (IRef.create 3 : immutable IRef.t)

The advantage of the polymorphic create function is straightforward: it allows you to write functions that are more polymorphic, and therefore more flexible. For instance, you could write a single function that could create, depending on context, an array of readwrite refs, an array of readonly refs, or an array of immutable refs.

The downside is that it may require more type annotations when you do want to be explicit about the permissions, and it also allows some weird types to come up. In particular, you can create an IRef.t with any phantom parameter you want! Nothing stops you from creating a string IRef.t, even though string doesn't make any sense as an access-control tag. Interestingly, the signature doesn't actually make any reference to the immutable type, and in fact, using any phantom parameter other than readonly and readwrite makes the ref immutable. The access control restrictions still work in roughly the way you would expect, but it is still a bit harder to think about than the original signature.

Comments

A possible pitfall

First of all, great to see your blog guys! I took me a while to notice it, but better late than never. :)

I just wanted to point out something that might be a potential pitfall for people using phantom types in OCaml, especially for beginners that might be working on some complicated production code. That exactly happened to me and I was stuck on this for a while.

The following is the piece of code that follows Ron's examples, and on the first glance should be equivalent to the original code.

type readonly
type readwrite
 
module PRef : sig
  type 'a t
  val create : int -> readwrite t
  val set : readwrite t -> int -> unit
  val get : 'a t -> int
  val readonly : 'a t -> readonly t
end
  =
struct
  type 'a t = { mutable r : int }
  let create x = { r = x }
  let set t x = t.r <- x
  let get t = t.r
  let readonly t = t
end
This code does not compile. The error is
Values do not match:
  val readonly : 'a -> 'a
is not included in
  val readonly : 'a t -> readonly t
The reason for the error is that when OCaml compiler sees
type 'a t = { mutable r : int }
it will create a new polymorphic record type and it will not go into the record type being defined and check to see whether the 'a type variable is actually used. The compiler will not be able to equate any two types 'a t and 'b t as one could expect. The solution is very simple and it is to introduce the phantom type through type aliasing, like the following.
type z = { mutable r : int }
type 'a t = z
In this case the compiler will not introduce a new type but will consider any 'a t as just a synonym for z and it will easily equate any two t types.
This shows important distinction between type definitions and type synonyms (or aliases) in OCaml, the distinction that Haskell for example makes explicit by using different keywords for the two.

Type definitions and type synonyms

Right, this is a complexity that I didn't mention but should have. I do think OCaml would be better if it made the distinction explicit. And it's not just Haskell that gets this right; SML does too.

Another work-around

Another way to get your snippet to compile is to add a coercion.

let readonly t = (t :> readonly t)

The way I think about it, the type 'a t can be either covariant or contravariant in 'a, but OCaml sort of doesn't have a way of expressing this in the type -- there's only +'a and -'a, no +-'a. The coercion somehow tells it to look inside the type and infer the variance from the right hand side of type 'a t = { mutable r : int }. I don't know how closely this matches the theory or implementation of OCaml's type system, but it matches its behavior as far as I know.

Syndicate content