A universal type?

Is it possible in OCaml to implement a universal type, into which any other type can be embedded? More concretely, is possible to implement the following signature?

module type Univ = sig
  type t
  val embed: unit -> ('a -> t) * (t -> 'a option)
end

The idea is that t is the universal type and that embed () returns a pair (inj, prj), which inject to and project from the universal type. Projection is partial (returns an option) because injection is not surjective.

Here is an example of how to use `Univ'.

module Test (U : Univ) = struct
  let (of_int, to_int) = U.embed ()
  let (of_string, to_string) = U.embed ()
  let r : U.t ref = ref (of_int 13)
  let () = begin
    assert (to_int !r = Some 13);
    assert (to_string !r = None);
    r := of_string "foo";
    assert (to_int !r = None);
    assert (to_string !r = Some "foo");
  end
end

Try it for yourself and see if you can implement module Univ : Univ so that Test (Univ) passes. No Obj.magic or other unsafe features allowed!

Comments

A couple of caveats

It is amazing to see you financial people do such theory! I am delighted. This exercise has connection to realizability toposes. I have three remarks.

First, if projection can map into 'a option then it can map into 'a because every type is inhabited, say by a diverging term. So you do not really need the option type. But it is probably better to do it that way, if you can, because returning None is friendlier than diverging.

Second, how is this supposed to play with diverging terms? Are we embedding terms or values? Presumably values. Since Ocaml is call-by-value it might be hard to embed terms that denote diverging values.

Third, what does it mean that injection followed by projection is the identity when we embed higher-order types, such as (int->int)->int? In your example above you indicate how it is supposed to work with int and string. Can you give us an example of how it is supposed to work with a function type, where comparison = does not work?

With reference

Hiding the type in a reference should do the trick (this is not thread-safe, though!):

module U : Univ = struct
  type t = (unit -> unit) * (unit -> unit)
 
  let embed () =
    let r = ref None in
    let put x = (fun () -> r := Some x), (fun () -> r := None) in
    let get (f, g) = f (); let res = !r in g (); res in
    put, get
end

It would be very tempting to use an exception defined in a local module, but unfortunately the type of the exception's argument cannot depend on a type variable...

on universal

I got a plain solution which passed the given tests, but I wouldn't consider it universal, as in my solution an an 'a type value encoded by an 'a type encoder can't be decoded by any other 'a type decoders except the one coming from the same pair. Could you confirm that? or could you give the expected behavior of the following code?

let (of_int, to_int) = U.embed ()
let e = of_int 1 
let (of_int', to_int') = U.embed ()
let e' = of_int' 1
assert (to_int e' = Some 1);
assert (to_int' e = Some 1);

The assertions will fail

And I think (but am not entirely sure) that that is the only behavior that is possible to implement in OCaml.

Universal types...

I am not going to answer, because it would ruin the fun for others. Actually, Stephen was the first to tell me about this little neat trick. Having a universal type turns out to be pretty powerful. You can easily build a type Univ.t list and then you can store arbitrarily typed data into that list.

The consequence is a type safe property list working for any type. When you get the hang of these, many of the usual tricks from dynamically typed languages become possible. But in a type-safe way! The option type on the projection will protect you.

However, I better not ruin Stephens later blog posts. There is more juice to be squeezed from this lemon.

Alain's got it

Andrej, yes, an option is much better than diverging (or raising an exception), because the partiality is expressed in the type system and the user is forced to think about it. And yeah, since OCaml is call by value, embedding values is the only thing that makes sense to me. Embedding higher-order types works just like ground types. No use of polymorphic comparison is required.

d2bg, I don't know how to do it so that different calls to embed at the same type are able to share. And in fact, in the applications I've used this for, one would not want that. So, your example code would raise an assertion failure.

Alain's solution using the ref cell is pretty much what I had in mind. Although I think it can be done a little more efficiently. Alain's solution allocates two closures and a 2-tuple per injection, and Some variant per projection. I think it's possible to allocate one closure and a two-tuple per injection and nothing per projection. Also note how Alain's solution clears the ref cell by calling g() after the value is extracted. This is essential to avoid a potential space leak, and is preserved in my (slightly) more efficient solution.

The other universal type

Alain's solution is very nice. Why do we need two-tuple? It seems to me that unit -> unit is already universal, with one closure and one ref per embed:

module U : Univ = struct
  type t = unit -> unit
 
  let embed () =
    let r = ref None in
    let put x = (fun () -> r := Some x) in
    let get f = f (); let x = !r in r := None; x in
    put, get
end

I thought you were asking for a solution in which different calls to U.embed at the same type share embedded values. That seems a bit harder, especially if we're supposed to do it with a single polymorphic embedding-projection pair. The definition of universal type I am acquainted with asks for a family of embeddings-projections, one for each type, whereas you asked for a single polymorphic embedding-projection pair (and the projection does not return an option). With a family we can do it, of course, as we just copy Alain's solution, separately at each type.

By the way, how is the universal type supposed to behave with respect to polymorphic types? Consider this:

let id x = x ;;
let inj, proj = U.embed () ;;
let Some f = proj (inj id) ;;
let x = f 10 ;;
let Some g = proj (inj id) ;;
let y = g "foo" ;;

this code doesn't implement what I intended

Consider the following.

let i1, p1 = U.embed ()
let i2, p2 = U.embed ()
let u1 = i1 13
let u2 = i2 "foo"
let () = assert (p2 u1 = None)
let () = assert (p1 u2 = None)

With your implementation, I believe the the second assert will fail, because p1 u2 will return Some 13. The problem is that p2 u1 stores in the ref cell for the first embedding (call it r1), but then clears the ref cell for the second embedding (call it r2). Then, p1 u2 stores into r2, fetches from r1 (which still holds Some 13), and clears r1.

Second try

Ok, second try! This version allocates one closure and a Some variant (not a two-tuple) per injection, and nothing per projection.

  type t = bool -> unit
 
  let embed () =
    let r = ref None in
    let put x = let sx = Some x in fun b -> r := if b then sx else None in
    let get f = f true; let res = !r in f false; res in
    put, get

This is nice, possibly better

This is nice, possibly better than what I had in mind (I'll post code tomorrow).

As to the space leak, I agree that your solution requires clearing the cell for correctness. I was mixing it up with my solution, which uses a unique id (unit ref) per embedding and avoids storing in the ref cell unless it dynamically knows that it's the right cell. In that case, the clear is purely to avoid the leak. I do think it would be a leak though, because both the inject and project closures are closed over the ref cell, and should be constant size, while the ref cell could hold a value of arbitrary size (completely unconnected to other values currently live in the program).

Space leak

Yes, of course, you're right about the space leak!

Space leak?

This is essential to avoid a potential space leak...

I don't understand that part of your comment. Clearing the ref cell is necessary to have a correct behavior; otherwise, a failed projection would leave some garbage that could cause a further wrong projection not to fail.

But I don't think that not clearing the ref cell would cause a space leak: the ref cell will not survive any longer than the closure (which must contain the value put into the cell anyway).

my solution

Here is my implementation of Univ. It is very similar to Alain's using the hidden ref cell, except it uses a unique id per embedding and only stores and clears the ref cell if it knows it will succeed.

module Univ : Univ = struct
  type t = { 
    id : unit ref;
    store : unit -> unit; 
  }
 
  let embed () = 
    let id = ref () in
    let r = ref None in
    let put a =
      let o = Some a in
      { id = id; store = (fun () -> r := o); }
    in
    let get t =
      if id == t.id then (t.store (); let a = !r in r := None; a) else None
    in
    (put, get)
end

The tradeoff between this and Alain's solution is that this one allocates an extra tuple per injection, but does fewer tests per projection, and doesn't store if the projection returns None (Alain's solution does two stores in either case.

Another possible difference might be thread safety. In either solution, once can use a single mutex for the entire module to make the mutation thread safe. However, I think my solution may be easier to extend using a mutex per embedding, which would have less contention. The idea is to wrap t.store (); let a = !r in r := None; in a critical section that holds the per-embedding mutex. For my solution, the single mutex suffices because we know that t.store () and r := None deal with the same ref cell. For Alain's solution, one would need to lock the mutex for both embeddings (which could deadlock), at which point one might as well just test whether one has the right embedding, leading naturally back to my solution. Alain, what do you think?

I am not aware of any other way to implement the Univ signature in OCaml (including using polymorphic variants and exceptions), but would be happy to see one. It is possible to implement Univ in SML using exceptions, but unfortunately not in OCaml because it doesn't have local exception declarations. One could use exceptions in OCaml to implement a functor that has a similar flavor to Univ, but as far as I know, it is not possible to meet the Univ signature. One nice thing about using exceptions (in either language) is that one gets thread safety automatically since no references are used.

Polymorphic exceptions would do the job

Maybe just having polymorphic (non-local) exceptions would be sufficient, with something like this:

module type Univ =
sig
  type t
  val embed : unit -> ('a -> t) ( (t -> a' option)
end
 
module U : Univ =
struct
  type t = unit -> unit
  exception Message of (t -> 'a) * 'a
  let embed () =
    let rec put x () = raise (Message (get, x))
    and get f =
      try f () ; None
    with Message (get', y) ->
         if get' == get then Some y else None
    in
      put, get
end
Here get plays a double role, the other one is as a unique token (a new closure is generated for get at every call to embed, so this should work). Since ocaml does not have polymorphic exceptions (does SML?) this won't work. Here is an "approximation" using functors:
module type Univ =
sig
  type t
 
  module Make : functor (Any : sig type t end) ->
    sig
      val embed : unit -> (Any.t -> t) * (t -> Any.t option)
    end
end
 
module U : Univ =
struct
  type t = unit -> unit
 
  module Make (Any : sig type t end) =
  struct
    exception Message of (t -> Any.t option) * Any.t
    let embed () =
      let rec put x () = raise (Message (get, x))
      and get f =
   try
      f () ; None
  with Message (get', y) -> if get' == get then Some y else None
      in
   put, get
  end
end
However, once we have the exception wrapped inside a functor it becomes local anyway and we can use the solution you had in mind.

implementing a universal type with exceptions

SML does have local exceptions.

There is no need to use 'try' and 'raise' to implement a universal type with exceptions. One simply takes the universal type to be exn. In OCaml:

module type Univ' = sig
  type t
 
  module Embed (M : sig type t end) : sig
    val inj : M.t -> t
    val prj : t -> M.t option
  end
end
 
module Univ' : Univ' = struct
  type t = exn 
 
  module Embed (M : sig type t end) = struct
    exception E of M.t
    let inj m = E m
    let prj = function E m -> Some m | _ -> None
  end
end

In SML, one could use the same approach with local exceptions and match the original Univ signature.

This trick seems to have been

This trick seems to have been invented a number of times by different people. Davide Sangiorgi and I discussed a variant here:

http://www.cis.upenn.edu/~bcpierce/papers/polybisim.ps

One interesting twist was that the same trick worked both in polymorphic lambda-calculi with references (ML, etc.) and in the polymorphic pi-calculus (e.g., the Pict language) with channels playing the role of reference cells.

A solution with local exceptions

Here is another solution:

module U : Univ = struct
  type t = exn
 
  let embed () =
    let new type s in
    let module M = struct exception E of s end in
    (fun x -> M.E x), (function M.E x -> Some x | _ -> None)
end 

It relies on the "let new type t in e" construction developed at LexiFi, that I've mentioned somewhere else on this blog. This new construction should appear in a later official version of OCaml.

This solution is thread-safe and it should be quite efficient. One can of course move the allocation from the projection function into the injection function by declaring "exception E of s option".

The construction "let new type t in e" creates a new abstract type "t" visible in the scope of the sub-expression "e". If this sub-expression "e" is well-typed and the type "t" does not escape to the environment, then the type for "let new type t in e" is obtained from the type for "e" by replacing "t" by a fresh type variable.

More modular implementation

Using our first-class modules, one can also write something more modular:

module type S = sig type t exception E of t end
type 'a prop = (module S with type t = 'a)
 
let create (type s) () =
  let module M = struct type t = s exception E of t end in
  (module M : S with type t = s)
 
let inj (type s) p x =
  let module M = (val p : S with type t = s) in
  M.E x
 
let proj (type s) p y =
  let module M = (val p : S with type t = s) in
  match y with M.E x -> Some x | _ -> None
 
let embed () = let p = create () in inj p, proj p

Here, we use first-class modules to represent first-class exception constructors (which can be used in pattern matching).

Syndicate content