Core Principles: uniformity of interface

This is intended to be the first in a series of posts talking about the design principles behind core, Jane Street's alternative to OCaml's standard library.

It's worth noting that we haven't quite fully achieved any of our design goals. Core is at the center of a complicated and evolving software infrastructure, and it takes longer to force changes through that infrastructure that it does to figure out what changes should be made. So these principles serve as both a guide to how the library is currently laid out as well as an indication of what kinds of changes are likely to come over the next year or so.

The principle I'm going to talk about in this post is the idea of uniformity of interface. There are a few basic reasons for keeping interfaces uniform: first, to make it easier for people to learn and remember a module's interface; second, to make it easier to use functors to extend a module's functionality; and third, to avoid wasting time on making essentially trivial design decisions over and over. The last one is a bit surprising but is nonetheless real. When you have a significant number of people collaborating on a code base, having standards for how that code is to be written eliminates a lot of pointless decision-making about how things should be done.

Here are a few of the design ideas we've had that we try to apply uniformly:

Types and modules

In core, almost all types have dedicated modules, with the type associated with a module called t. This is not an uncommon pattern in OCaml code in general and in the standard library in particular, but in core, the approach is taken more consistently. Thus, core has modules for float, int, option and bool. This is convenient both because it provides natural place to put functions and values that otherwise just swim around in Pervasives, and because it makes the naming easier to remember. For instance, the modules Bool, Float and Int all have to_string and of_string functions. Similarly, the Int module has the same basic interface as the Int64, Int32 and Nativeint modules.

t comes first

One choice that you have to make over and over again in any library is the order in which arguments are listed. One thing you could optimize for when making this decision is the ease of use for partial application. This is not a crazy approach, but it's often hard to guess in advance which order will be most useful. There are other things to consider as well: putting a function argument (e.g., the function that you pass to List.map) at the end often increases readability, since the function argument can be quite large and is often awkward sitting in the middle of the argument list. Sadly, this often conflicts with the most useful order for partial application.

Rather than make idiosyncratic choices on a function-by-function basis, we prefer to have clear and unambiguous rules where possible. Once such rule we've (mostly) adopted is, within a module whose primary type is t, to put the argument of type t first. Thus, Map.find, Hashtbl.find and Queue.enqueue all take the container type first. This rule doesn't lead to an optimal choice for every function, but it is very convenient, and is simple and easy to apply consistently.

Exceptions, options and function names

In core, the default functions only throw exceptions in truly exceptional circumstances. Thus, List.find returns an option rather than throwing Not_found. That said, there are cases where the exception-throwing version of the function is useful as well. The convention we now use is to mark the exception-throwing version of a function with _exn. So, we have Map.find and Map.find_exn, Queue.peek and Queue.peek_exn, and List.nth and List.nth_exn.

Standardized interface includes

There are a number of standardized interfaces that we use as components of lots of different signatures. Thus, if you had a module representing a type that could be converted back and forth to floats, supported comparison and has its own hash function, you could write the interface as follows:

module M : sig
  type t
  include Floatable with type floatable = t
  include Comparable with type comparable = t
  include Hashable with type hashable = t
end

By making some of core's conventions explicit, it makes it easier to enforce these conventions, so that parallel functions are forced to have the same name and type signatures across many different modules. It also makes it easier to design functors on top of these modules. So, for example, the Piecewise_linear module contains a functor that takes as its input any module which is both floatable and sexpable.

Comments

Consistent look and feel of client code

t comes first: Was this argued vehemently by sweeks? ;-) Anyway, I have to agree that putting ad hoc lambdas last tends to be more readable. OTOH, as you note, it can reduce the opportunities for points free programming. It is a funny thing. Doing a little hacking on the MLton compiler, I notice how putting the ad hoc lambda last looks nice, but then I'll soon run into a case where I could have used a trivial points free definition... It is a difficult trade-off!

Frankly, I'm not convinced that aiming for maximal uniformity (or one size fits all) is the best guiding principle for design. A good IDE with non-intrusive documentation lookup could go a long way towards eliminating the need for uniformity to help memory, for example. Uniformity is really the antithesis of design: when you don't have a unique design in mind, uniformity can be a useful default. OTOH, when you actually do have an original idea for a design, uniformity (at the micro design level) often gets in the way. IMO, consistency (or harmony) within a design (or library) is more important than uniformity across many designs. Also, it is important to keep in mind that some designs, like the concept of Comparable values, apply across many modules. I think that it also often makes sense to have more than one interface (or skin), one for each "mode of use", to a set of functionality.

In my own designs, regardless of language, I usually spend considerable time tweaking function signatures to make client code as nice as possible. When I start designing a library, I often start by writing examples of how I'd like to be able to write programs and then figure out how to make it possible. I usually try to make the client code look like a specification. In fact, I often start a library design, because I have an idea of how to make the interface to a set of functionality better. I'm not really primarily looking for uniformity at the level of listing function arguments in a particular order (although this tends to happen as a natural side-effect), but rather for a consistent look-and-feel of the client code.

Standardized interface includes: Was this inspired by my post on the MLton list? If it wasn't, then I'd love to know where you got the idea. It is such a simple idea that I've wondered why it hasn't been used more often.

Consistency

First, you're not wrong to see Steven's hand in much of this, although the "t comes first" rule was argued by a number of people before I finally caved. I understand where you're coming from on the uniformity question, and that's where I started. But more than a year's worth of exposure to Steven's style has pretty much won me over. It's not that you give up all need to make decisions about interfaces. But a whole class of essentially trivial decisions evaporate, and that's mostly for the good. I agree that there is a tradeoff in terms of partial application. But having seen this tradeoff play out over a fairly large codebase, it seems very worth it. The resulting client code is cleaner, more uniform, and easier to understand.

One thing worth noting: the loss in terms of partial applicability is somewhat smaller than you might think, since when you have a large codebase, it's harder anyway to predict in advance which order for partial application will be the most useful.

t comes first

It is worth noting that the rule "t comes first" has no actual impact when all the other arguments are labeled. A trivial example is List.map: even though it is declared as "let map l ~f = ..." with l coming first, both partial applications "List.map l" and "List.map ~f" are valid. A good chunk of the Core library actually works this way.

Consistency

Do you have a list of acceptable exceptions to this rule, especially related to labelled arguments that must not be last? I am asking this because all of your fold and most of your map functions in core do not respect the "t comes first" principle (and the ~init position is not consistent). All these consistently use the label ~f for the function however so in practice it probably makes little difference. I am wondering though whether this is because core predates the explicit statement of the rules you talk about and will eventually be modified to respect them more thoroughly?

Inconsistencies

We definitely expect core to get more consistent over time, for precisely the reason you mention. As for the interaction of labels and the "t comes first" rule, I'm not sure how that will shake out. The practical impact is so small that it's not clear it's worth rewrapping the standard library to fix it. Some of these things will get more uniform as we add interfaces that capture some of the higher order functions like map, iter and fold, which is something we're already doing.

Hi Vesa. Yes, the idea for

Hi Vesa. Yes, the idea for naming the types in abstract interfaces comes from your post. The idea for abstract interfaces has been around for much longer.

Incidentally, OCaml has a better way to do abstract interfaces using signature functors. They have the advantage that one doesn't need the extra type name at all.

As to the "t comes first style", which I certainly liked before coming to Jane Street, having seen a lot more code here with a lot of different styles, and having gained a much greater appreciation for readability of code (we do a lot of code review at Jane Street), I feel more strongly than ever that the style is a good one. Optimizing the interface of a particular module is the wrong tradeoff. People using a large codebase don't have time to understand lots of different conventions, and the smaller client code tends to become more obfuscated. Readability comes from consistency and explicitness as much as from concision, and when there is a tradeoff, I'd opt for them over concision.

Signature functors

Hmm... Looking at OCaml's docs (module grammar) for a couple of minutes I came up with:

module type T = sig type t end
 
module Ordered (T : T) = struct
  module type Sig = sig
    val (<) : T.t -> T.t -> T.t
  end
end
 
module type FOO = sig
  module T : T
  include Ordered (T).Sig
end

Hmm... is there a way to get rid of the FOO.T module spec? Or is there perhaps some even better way?

I don't know of anything better

.

Syndicate content