Arthur J. O'Dwyer

ocamlmerr, a program for automatically generating error messages for compilers produced using ocamlyacc. Beta version. [Directory listing]


Format of ocamlmerr input

The ocamlmerr program takes as input a text file called meta.err, and also interacts with a binary-executable compiler (no default name) and in particular one OCaml source file pertaining to that compiler, merr.ml. The user of ocamlmerr only has to understand the internal format of the meta.err file and the "glue code" that needs to be inserted in the other files of the project, notably Makefile and *.mly; the merr.ml source file should never be seen or touched by anyone except the ocamlmerr program itself.

The metadata file meta.err has a format designed mainly to be easily parsed. It is divided into blocks, each of which consists of the following two parts:

  1. A header line, which tells what kind of block this is: a comment (%comment), a token-class definition (%classdef), or an error message definition (%token, %notoken, or %class) and also contains the error message being defined in the latter case.
  2. A sample program or list of programs that generates the given tokens.
The header line can always be identified by its initial sequence, which is the six characters %%MERR.

Here is a complete meta.err specification:

%%MERR %comment This is just an example.
%%MERR %classdef BinOp
+ {var- {var* {/
& | ^ {&& struct|| {^^
%%MERR %token Extraneous "var" in struct field definition
struct foo {var i:
%%MERR %notoken Extraneous material after program's closing brace
{} abc
%%MERR %notoken Expected an operator or semicolon after expression
{ a+b c }
%%MERR %class BinOp Unexpected binary operator
{+

The first line is a block by itself; it is a comment header. Comment headers begin with the identifying sequence %%MERR %comment and then may contain any text the user wants, up to the next line break. At that point, the state of the interpreter goes back to the initial state, in which it is expecting the beginning of a new block.

The second through fourth lines of the sample program define a new token class, or tclass, called BinOp and consisting of the tokens +, -, *, /, &, |, ^, &&, and ||. Notice that the list of whitespace-separated strings below the header line is not actually a list of those tokens; by design, ocamlmerr knows nothing of the grammar of the user's compiled language, and doesn't need to. Those strings are actually programs written in the user's language, each of which when compiled will produce a syntax error at the corresponding token. For example, the program struct|| will bail out upon seeing the token ||, and will return the small integer that represents that token in the internals of the user's ocamlyacc-generated grammar.

Note: Since the program strings in a %classdef block are separated by whitespace, they cannot actually contain any whitespace. For example, the specification

%%MERR %classdef Braces
var {; var }; 
will not do what the user is apparently expecting; rather than dealing with the two programs var {; and var };, the ocamlmerr program will try to compile all four whitespace-separated program strings, and will wind up with a few "unexpected end of file" errors rather than the state and token numbers it's expecting. So defining classes sometimes requires a bit of ingenuity on the user's part. This is a prime target for modifications to the program.


Invoking ocamlmerr

ocamlmerr [-?h] [-DdWw] [-m meta.err] [-o merr.ml] compiler-name

The -D option turns on minimal debugging output; namely, when run with this option the program will show which error message from meta.err it's currently trying to produce. This will generate a lot of useless output on most runs of ocamlmerr, so it is not recommended. In fact, it's not terribly useful for debugging in any event; if you do encounter a bug, it will be easier to track it down by strategic recompilations of ocamlmerr than by using any built-in program options.

The -W option tells the program to run a completely automatic wrapper script which initializes merr.ml to a blank template, recompiles the compiler via make ./compile (or whatever name the user supplies), runs ocamlmerr as usual to generate a new merr.ml, and finally recompiles the compiler a second time to incorporate the changes. The ocamlmerr -W invocation can thus be used inside a makefile to rebuild merr.ml based on changes to meta.err.

The -m parameter changes the name of meta.err so that the program reads its specifications from a different file.

The -o parameter changes the name of merr.ml so that the program writes its output (hence -o) to a different file.


This page was last updated    16 July 2005
This specification was released to the public domain by Arthur O'Dwyer, November 2004.