April 10, 2017

Nix is all about records. They are everywhere in nixpkgs, and even the semantics of nix seems to have been designed with this idea of records everywhere as a first class construct. (In fact, I think that it would be more correct to think of nix as "json with lambda-calculus on top of it" rather than "a functional language with lists and records").

Thus we have to type records.

What are records?

Records (or "attribute sets" as they are called in nix) are a data structure associating expressions (of an arbitrary type) to strings. More precisely, a record is a finite map from strings (the "labels" of the record) to the set of all expressions (the "values" of the record).

There are two main uses of records:

Static records

The first use of records is when you manipulate some data with a known structure that you want to store in a structured way. In this case, you know (statically) the labels and the type of the value associated to each label.

For example, the fetchurl function from nixpkgs (defined in pkgs/build-support/fetchurl/default.nix) expects arguments of the form:

{
  url = "some_string"; # Or nothing
  urls = [ "a" "list" "of" "strings"]; # Or nothing
  curlOpts = "some_string"; # Or nothing
  name = "some_string"; # Or nothing
  sha256 = "some_string"; # Or nothing
  executable = some_boolean; # Or nothing
}

(we only show some of the fields as the whole definition doesn't matter here)

For that kind of use, the name of the fields do not carry any semantic, they are just here to help the programmer (and in most compiled languages with static records, the name of the fields is dropped at compile-time).

This is the structure of the nixos options set (except that it is all done at runtime): all the option declarations generate a rigid structure in which the settings must fit. And (except for a few), the actual name of the options doesn't appear in the result of the evaluation.

Dynamic records (aka hashtables)

Another use of records is when we got some arbitrary data that we want to store with the possibility to retrieve it efficiently and easily, to iterate on it, to pass as a whole to another function... In that case, we may use records as hashtables − or as what functional programmers refer to as "maps" (with the restriction that the keys may only be strings). If the programmer doesn't want to shoot itself in the foot, all the fields will likely be of the same type.

This is used in particular for some nixos options (the one of type attrsOf α).

For example, the environment.etc option accepts as an argument a module whose fields will determine which files are copied (or symlinked) to /etc. By default, the label of each field will be used as the file name.

How do we type them?

Those two uses of records are really different, and require different typing (except of course if someone ever manages to provide some typing rules which unify both and are usable in practice).

The problem is that in nix (and in most untyped languages), both are happily mangled. They are first mangled because the same construct is used in both cases, with only some semantic considerations allowing to distinguish the intention of the programmer, and second because the same record is sometimes used as the same time as a static and as a dynamic one.

For example, the derivation builtins expects as an argument a record in which some fields ("name", "builder", "system", ...) are expected and have a special meaning, and all the others are just passed as environment variables.

Fortunately, we got a powerful enough formalism to be able to be still able do some typing (and some useful one!). The exact typing rules are on the github repository, but here is an informal writeup:

The base elements are atomic records (records with only one field), which are combined together to form bigger record. The rules for a basic record r = { e1 = e2; } where e2 has type τ are:

Various combination rules explain how to type the various merges of records, depending on whether we know that the fields are distinct or not.

There also are some (non published yet) rules for several builtins which will require some ad-hoc typing such as attrValues (which applied to a record returns the set of fields defined in it) or listToAttr (which applied to a key-value pairs list returns the corresponding record).

A note about nixos "types"

The nixos module system already has a notion of "types" and this new type system may seem redundant at least for this part of nixpkgs (or may seem to obsolete the old one).

I don't think either of those is likely to happen, given the possible benefits of a static type system and the fact that the runtime type system is at the core of the implementation of the current system. 1

A small explanation of how both can play together may however be quite useful. In general, embedding a type system into a language isn't really nice for the host type-system, because the guest may prove stuff than the host can't prove even with some help, and thus will want to bypass the restrictions it enforces. Fortunately, our type-system should be expressive enough to be able to understand the inner type system, and thus the actual typing of nixos options could be (for the most part) embedded into the new one. And for the cases where it would not be possible, the gradual part of the type system provides a nice way of bypassing the static type-system.


1 I think that if the module system had been implemented on top of a language with a rich enough type system − type classes or equivalent in particular, most of the runtime type system would have been useless (if not all). But the module system being built around it, replacing it would probably not be worth the benefit

Commentaires