Is it possible in OCaml to implement a universal type, into which any other type can be embedded? More concretely, is possible to implement the following signature?
module type Univ = sig type t val embed: unit -> ('a -> t) * (t -> 'a option) end
The idea is that t is the universal type and that embed () returns
a pair (inj, prj), which inject to and project from the universal
type. Projection is partial (returns an option) because injection is
not surjective.
Here is an example of how to use `Univ'.
module Test (U : Univ) = struct let (of_int, to_int) = U.embed () let (of_string, to_string) = U.embed () let r : U.t ref = ref (of_int 13) let () = begin assert (to_int !r = Some 13); assert (to_string !r = None); r := of_string "foo"; assert (to_int !r = None); assert (to_string !r = Some "foo"); end end
Try it for yourself and see if you can implement module Univ : Univ
so that Test (Univ) passes. No Obj.magic or other unsafe features
allowed!
Comments
A couple of caveats
It is amazing to see you financial people do such theory! I am delighted. This exercise has connection to realizability toposes. I have three remarks.
First, if projection can map into 'a option then it can map into 'a because every type is inhabited, say by a diverging term. So you do not really need the option type. But it is probably better to do it that way, if you can, because returning None is friendlier than diverging.
Second, how is this supposed to play with diverging terms? Are we embedding terms or values? Presumably values. Since Ocaml is call-by-value it might be hard to embed terms that denote diverging values.
Third, what does it mean that injection followed by projection is the identity when we embed higher-order types, such as (int->int)->int? In your example above you indicate how it is supposed to work with int and string. Can you give us an example of how it is supposed to work with a function type, where comparison = does not work?
With reference
Hiding the type in a reference should do the trick (this is not thread-safe, though!):
module U : Univ = struct type t = (unit -> unit) * (unit -> unit) let embed () = let r = ref None in let put x = (fun () -> r := Some x), (fun () -> r := None) in let get (f, g) = f (); let res = !r in g (); res in put, get endIt would be very tempting to use an exception defined in a local module, but unfortunately the type of the exception's argument cannot depend on a type variable...
on universal
I got a plain solution which passed the given tests, but I wouldn't consider it universal, as in my solution an an 'a type value encoded by an 'a type encoder can't be decoded by any other 'a type decoders except the one coming from the same pair. Could you confirm that? or could you give the expected behavior of the following code?
The assertions will fail
And I think (but am not entirely sure) that that is the only behavior that is possible to implement in OCaml.
Universal types...
I am not going to answer, because it would ruin the fun for others. Actually, Stephen was the first to tell me about this little neat trick. Having a universal type turns out to be pretty powerful. You can easily build a type Univ.t list and then you can store arbitrarily typed data into that list.
The consequence is a type safe property list working for any type. When you get the hang of these, many of the usual tricks from dynamically typed languages become possible. But in a type-safe way! The option type on the projection will protect you.
However, I better not ruin Stephens later blog posts. There is more juice to be squeezed from this lemon.
Alain's got it
Andrej, yes, an option is much better than diverging (or raising an exception), because the partiality is expressed in the type system and the user is forced to think about it. And yeah, since OCaml is call by value, embedding values is the only thing that makes sense to me. Embedding higher-order types works just like ground types. No use of polymorphic comparison is required.
d2bg, I don't know how to do it so that different calls to embed at the same type are able to share. And in fact, in the applications I've used this for, one would not want that. So, your example code would raise an assertion failure.
Alain's solution using the ref cell is pretty much what I had in mind. Although I think it can be done a little more efficiently. Alain's solution allocates two closures and a 2-tuple per injection, and Some variant per projection. I think it's possible to allocate one closure and a two-tuple per injection and nothing per projection. Also note how Alain's solution clears the ref cell by calling
g()after the value is extracted. This is essential to avoid a potential space leak, and is preserved in my (slightly) more efficient solution.The other universal type
Alain's solution is very nice. Why do we need two-tuple? It seems to me that
unit -> unitis already universal, with one closure and one ref per embed:I thought you were asking for a solution in which different calls to
U.embedat the same type share embedded values. That seems a bit harder, especially if we're supposed to do it with a single polymorphic embedding-projection pair. The definition of universal type I am acquainted with asks for a family of embeddings-projections, one for each type, whereas you asked for a single polymorphic embedding-projection pair (and the projection does not return an option). With a family we can do it, of course, as we just copy Alain's solution, separately at each type.By the way, how is the universal type supposed to behave with respect to polymorphic types? Consider this:
this code doesn't implement what I intended
Consider the following.
With your implementation, I believe the the second assert will fail, because
p1 u2will returnSome 13. The problem is thatp2 u1stores in the ref cell for the first embedding (call it r1), but then clears the ref cell for the second embedding (call it r2). Then,p1 u2stores into r2, fetches from r1 (which still holdsSome 13), and clears r1.Second try
Ok, second try! This version allocates one closure and a Some variant (not a two-tuple) per injection, and nothing per projection.
type t = bool -> unit let embed () = let r = ref None in let put x = let sx = Some x in fun b -> r := if b then sx else None in let get f = f true; let res = !r in f false; res in put, getThis is nice, possibly better
This is nice, possibly better than what I had in mind (I'll post code tomorrow).
As to the space leak, I agree that your solution requires clearing the cell for correctness. I was mixing it up with my solution, which uses a unique id (unit ref) per embedding and avoids storing in the ref cell unless it dynamically knows that it's the right cell. In that case, the clear is purely to avoid the leak. I do think it would be a leak though, because both the inject and project closures are closed over the ref cell, and should be constant size, while the ref cell could hold a value of arbitrary size (completely unconnected to other values currently live in the program).
Space leak
Yes, of course, you're right about the space leak!
Space leak?
This is essential to avoid a potential space leak...
I don't understand that part of your comment. Clearing the ref cell is necessary to have a correct behavior; otherwise, a failed projection would leave some garbage that could cause a further wrong projection not to fail.
But I don't think that not clearing the ref cell would cause a space leak: the ref cell will not survive any longer than the closure (which must contain the value put into the cell anyway).
my solution
Here is my implementation of
Univ. It is very similar to Alain's using the hidden ref cell, except it uses a unique id per embedding and only stores and clears the ref cell if it knows it will succeed.The tradeoff between this and Alain's solution is that this one allocates an extra tuple per injection, but does fewer tests per projection, and doesn't store if the projection returns None (Alain's solution does two stores in either case.
Another possible difference might be thread safety. In either solution, once can use a single mutex for the entire module to make the mutation thread safe. However, I think my solution may be easier to extend using a mutex per embedding, which would have less contention. The idea is to wrap
t.store (); let a = !r in r := None;in a critical section that holds the per-embedding mutex. For my solution, the single mutex suffices because we know thatt.store ()andr := Nonedeal with the same ref cell. For Alain's solution, one would need to lock the mutex for both embeddings (which could deadlock), at which point one might as well just test whether one has the right embedding, leading naturally back to my solution. Alain, what do you think?I am not aware of any other way to implement the
Univsignature in OCaml (including using polymorphic variants and exceptions), but would be happy to see one. It is possible to implementUnivin SML using exceptions, but unfortunately not in OCaml because it doesn't have local exception declarations. One could use exceptions in OCaml to implement a functor that has a similar flavor toUniv, but as far as I know, it is not possible to meet theUnivsignature. One nice thing about using exceptions (in either language) is that one gets thread safety automatically since no references are used.Polymorphic exceptions would do the job
Maybe just having polymorphic (non-local) exceptions would be sufficient, with something like this:
getplays a double role, the other one is as a unique token (a new closure is generated forgetat every call toembed, so this should work). Since ocaml does not have polymorphic exceptions (does SML?) this won't work. Here is an "approximation" using functors:implementing a universal type with exceptions
SML does have local exceptions.
There is no need to use 'try' and 'raise' to implement a universal type with exceptions. One simply takes the universal type to be
exn. In OCaml:In SML, one could use the same approach with local exceptions and match the original
Univsignature.This trick seems to have been
This trick seems to have been invented a number of times by different people. Davide Sangiorgi and I discussed a variant here:
http://www.cis.upenn.edu/~bcpierce/papers/polybisim.ps
One interesting twist was that the same trick worked both in polymorphic lambda-calculi with references (ML, etc.) and in the polymorphic pi-calculus (e.g., the Pict language) with channels playing the role of reference cells.
A solution with local exceptions
Here is another solution:
It relies on the "let new type t in e" construction developed at LexiFi, that I've mentioned somewhere else on this blog. This new construction should appear in a later official version of OCaml.
This solution is thread-safe and it should be quite efficient. One can of course move the allocation from the projection function into the injection function by declaring "exception E of s option".
The construction "let new type t in e" creates a new abstract type "t" visible in the scope of the sub-expression "e". If this sub-expression "e" is well-typed and the type "t" does not escape to the environment, then the type for "let new type t in e" is obtained from the type for "e" by replacing "t" by a fresh type variable.
More modular implementation
Using our first-class modules, one can also write something more modular:
Here, we use first-class modules to represent first-class exception constructors (which can be used in pattern matching).