Modularization: Basic Concepts


 

The fragment language provides support for creating a program as the combination of several independent or interdependent modules. It is a language in its own right, and it could be used for the modularization of any programmming language whose syntax can be described by a context free grammar (well, with the current tools: a LALR(1) grammar).

A gbeta program, as well as a traditional BETA program, is written in a mixture of the fragment language and the (gbeta or BETA) programming language. The fragment language defines the top-level structure, and the programming language syntax occurs as fragments of varying size, inserted into the fragment language constructs at certain places.

The fragment language also defines some bottom-level expressions, i.e. small pieces of syntax that do not contain any syntax from the programming language. These bottom-level expressions are inserted into the programming language syntax as place-holders.

The following sections will add the missing details to this description. The basic concepts can be introduced without considering multi-file programs, so we commence with a one-file situation.

Declarations and applications of SLOTs

First, let us take a look at the top-level construct where the fragment language construct encloses some programming language syntax (piece-of-code is written in the programming language). The basic purpose of this is to give a name to a piece-of-code:

(* SLOT declaration *)
-- name:non-terminal:grammar --
piece-of-code

The piece-of-code must be syntactically derived from the given non-terminal. If for example the non-terminal is dopart, then the piece-of-code must be a dopart, i.e. "do" followed by a number of imperatives. Such a named piece of code can then be used (applied) in other places by referring to the name:

...
(* SLOT application *)
<<SLOT name:non-terminal>>
...

This is the bottom-level construct of the fragment language. The <<SLOT ..>> syntax appears in the middle of some programming language syntax.

There is an analogy to constants in ordinary programming languages: The slot declaration says that the given name is declared to be a constant whose value is the associated piece-of-code. The slot application instructs the language processing system (compiler, interpreter, analyzer, ..) to look up the piece of code with the given name and put it right here, in stead of the <<SLOT ..>>.

Think of it as a kind of search-and-replace operation which will substitute away slot applications until the entire program is written in the programming language and all of the fragment language syntax has been eliminated.

Here's an example:

-- betaenv:descriptor --
(# s: @string
<<SLOT main:dopart>>
#)

-- main:dopart --
do '"s" can be accessed from here'->s

The betaenv:descriptor slot is special. Since we cannot substitute pieces of code into each other ad infinitum, we must choose a distinguished piece of code to be the root of the system. That is a descriptor with the name betaenv. Later we'll add one more constraint to this.

If we perform the search-and-replace process on the above example, we get the following program:

(# s: @string
do '"s" can be accessed from here'->s
#)

Separate compilation vs. generality

The semantics of slots is described by the search-and-replace scenario, but a language processing system would normally not be appropriate for real-world use if it actually did modify the source code in such a way. The problem is that compilation of almost any BETA program would imply changes in very basic (highly depended-upon) files, and hence almost any BETA compilation would be a recompilation of "the entire universe" (including various standard libraries).

Luckily, it does not have to be like that. As long as a BETA compiler is able to compile the code in such a way that resulting programs behave as if the search-and-replace operations had taken place, the semantics will be correct, the basic files will remain unaffected, and compilation will take time roughly proportional to the size of the program, not to the size of the "universe."

So, in practice (in particular in the Mjolner BETA System), separate compilation is achieved by supporting some non-terminals as slots (notably dopart, descriptor, and attributes) for which separate compilation has been implemented, and reject programs using other non-terminals as slots. Moreover, there are some restrictions on the usage of the supported non-terminals. Look into the standard BETA FAQ for details about this. As practical experience shows, even those few supported non-terminals are sufficient to establish a very expressive and flexible module system.

Nevertheless, it's interesting to be able to experiment with all non-terminals of the grammar, as well as being relieved of those restrictions which the Mjolner BETA System puts on the usage of the supported non-terminals. Notably, it is a serious constraint that no substance can be declared in an attributes slot, even though there is a reasonable work-around.

Because of this, and because gbeta (being an interpreter) would look at all of the source code anyway, gbeta was designed in such a way that all non-terminals are supported, and there are no special restrictions on their usage. On the other hand there is no "separate compilation" in gbeta. It is a possible future project to reconcile the generality of the fragment language in gbeta and some notion of separateness and persistence in the gbeta analysis.

How many files?

Everything said sofar can be tried out using programs consisting of only one source code file, but it basically also holds in the general case where there are several files involved. However, with several files we have to consider visibility ("privateness") issues and issues concerning the relations between those several files. The next section deals with this.

 


Signed by: eernst@cs.auc.dk. Last Modified: 3-Jul-01