One of the best features of ML is pattern matching. Pattern matching is essentially a way of writing a case analysis driven by the structure of the data. The thing that makes pattern matching such a phenomenal tool is the type-checking discipline that is associated with it. In particular, the compiler checks that pattern matches are exhaustive and non-redundant. This is helpful when writing a case analysis for the first time, but the value of the technique really shows itself as the code evolves. In the absence of such checks, it's easy for a case analysis to silently become incomplete as the underlying data structures change, thus letting bugs creep in.
Pattern matching makes it easy to make sure that a case analysis is exhaustive, but there are other kinds of exhaustiveness that we might want static checks for where the compiler provides no help. One example we see a lot comes up in the context of validation. In a lot of the code we write, it is necessary to validate a configuration value represented as a record, and we want to make sure that every record field is explicitly checked.
After hearing me complain about how little help the language provided in this case, Steven proposed an elegant solution: use a macro to generate for any record definition a function that folds over the fields in that record. Thus, if you declare a record as follows:
module M = struct type t = { foo: int; bar: float; snoo: (int * string) list; } with field_fold end
you would end up with the following signature
module M : sig type t = { foo: int; bar: float; snoo: (int * string) list; } val field_fold : t -> 'a -> foo : (int -> 'a -> 'a) -> bar : (float -> 'a -> 'a) -> snoo : ((int * string) list -> 'a -> 'a) -> 'a end
By writing your validation function around field_fold, you can make
sure that the validation catches every field in the record, even as
the definition of the record changes over time. But this idiom is not
really specific to the validation case. You can use it just as well
in other situations where exhaustiveness is important, such as writing
an equality function or writing code to serialize a record.
We haven't implemented this yet, but it's coming. The version we're
going to do is a little bit more sophisticated that what I've
described here. We already have a set of macros for generating values
representing each field in a record that can be used for getting and
setting values and which contain a string representation of the name
of the field. I expect our field_fold will be based on these field
representatives rather just on the raw values. I expect this all will
eventually find its way into a public release of core.
Comments
Exhaustive matching for record fields
This is a very interesting topic. At LexiFi, we have faced this situation where some AST's constructor got more and more fields, but we were reluctant to put them in a record instead of keeping them as arguments of an n-ary constructor. The nice thing with tuples and n-ary constructors is that each time you add some field, the compiler will show you every pattern that deconstructs the fields. We are talking about AST's and we really want to keep the ability to pattern match on record fields (as opposed to field folding).
We adopted a different solution and (guess!) patched the compiler. Now we can add some syntactic annotation on a record type definition to tell the compiler to check for exhaustiveness of any pattern of this record type (in the sense that all the fields must appear in the pattern). The annotation can also go on a specific record pattern. The patch is quite simple, but it makes us much more confident when we turn a tuple into a record. While writing this message, I realize we could be more subtle and mark only some fields as being required in any pattern, but we didn't feel this need in practice yet.
Exhaustive handling of fields
Using record pattern matching is a very elegant solution. This of course only helps you when you turn on compiler warnings that make sure that every variable you defined is actually used.
I find it very interesting to see someone else running up against this same need to exhaustively analyze product types. My sense has been that the need to do this is not widely recognized in academic PL circles, which always struck me as a strange oversight, given the well-known utility of having the compiler check the exhaustiveness of the analysis of sum types.
The tradeoff between hacking the compiler and using a camlp4 is always a tough one. camlp4 extensions are easier to share with others (as we have done with the release of binprot and sexplib), but for some things, they are necessarily less elegant. I would like to eventually see some of the better tested and thought-out ideas make their way back into the mainline compiler.
Turning on the compiler
Turning on the compiler warnings about unused variables is a good idea but event without the warnings, the compiler stops us at each record pattern when we add some field to the record type (if it has been declared to have this behavior).
The decision to hack the compiler is quite easy for us since LexiFi has a long experience in doing it and synchronizing with INRIA. We inform OCaml developers about changes that are not specific to our domain and of course we would be happy to see these changes integrated with INRIA's CVS.