Modularization: Using Several Files


 

Fragment groups, forms, and properties

A source file is called a fragment group, because it contains a number of fragment forms along with the specification of the relation to other fragment groups, the fragment properties:

fragment-group ::= fragment-property* fragment-form*;
fragment-property ::= property-kind group-spec ";";
property-kind ::| "ORIGIN" | "INCLUDE" | "BODY";
group-spec ::= pathname;
fragment-form ::= form-head form-body;
form-head ::= "--" form-name ":" non-terminal-name "--";

This is a simplified grammar for the fragment language, but since it describes a language which is a large subset of the actual fragment language you can use it as it is. Certain non-terminals are left unspecified, namely:
  • non-terminal-name: an identifier which occurs in the grammar of the language as a non-terminal (i.e. at the left hand side of a "::=", "::|", "::+", or "::*" symbol), e.g. "dopart"
  • form-name: an identifier, e.g. "aNameForAPieceOfCode"
  • pathname: the name of a file in the file system, without the extension (such as ".gb") and enclosed in single quotes, e.g. "'private/myImplementation'"; a user specification like "~beta/" in the beginning of a path can be treated specially as specified in some configuration files, but you shouldn't need to worry about this when using gbeta
  • form-body: a sentential form (a syntactic derivation, partial or complete) in the grammar of the programming language, derived from the the non-terminal-name of the fragment form. If it is complete (no non-terminals), it is simply a piece of syntax in the programming language; if it still contains non-terminals, they must be written as slot applications:

    "<<" "SLOT" form-name ":" non-terminal-name ">>"

Here is an example of a full-fledged fragment group in gbeta:

ORIGIN 'betaenv';
INCLUDE 'myLib';

-- program:merge --
(# anObject: <<SLOT anObject:staticItem>>
<<SLOT mainProgram:dopart>>
#)

-- anObject:staticItem --
@(# x: @integer #)

-- mainProgram:dopart --
do 2->anObject.x
   aLibMethod (* presumably declared in 'myLib' *)

Fragment graphs

When we have several fragment groups in a program, we must have a way to specify which groups are included, and what the relations are within this set of fragment groups. This is done by constructing a fragment graph according to the fragment properties of the groups.

The starting point is always one distinguished fragment group, the seed of the fragment graph. This is the fragment group which is specified as "the file to execute" when starting gbeta, probably what you think of as the "main file" of the program. Starting from the seed, we construct two graphs whose nodes are fragment groups and whose edges are properties. The following two paragraphs give a precise definition of the concepts of extent and domain, and the third paragraph gives a more understandable explanation (depending on your attitude to math ;-).

The first graph, the extent graph, contains the groups which are reachable from the seed via some number of ORIGIN, INCLUDE, or BODY links (i.e. all properties are used). It is, hence, the transitive closure of all property links from the seed. This graph determines what fragment groups are included in the program, i.e. it determines the overall "content" of the program.

Starting from any fragment group G in the graph, the subgraph of the extent which is induced by the ORIGIN and INCLUDE links is called the domain graph of G (with respect to the seed). The domain graph determines visibility, as explained below.

Another way to explain this is that the extent contains all the source code you can find by looking into fragment groups mentioned in properties of the seed group, or of groups looked up that way, or of groups looked up from there etc.etc. When you have outlined the extent, you can determine the domain of each group by a similar, repeated lookup process, but this time you are not allowed to use the BODY properties.

The basic intuition is that you use a "BODY 'aGroup';" property to request that the source code in aGroup is included into the program; and you use to "INCLUDE 'aGroup';" to request that the source code in aGroup is included into the program and made visible such that the source code in this fragment may depend on it. In other words,

INCLUDE means "I need this!"

and:

BODY means "I need this, but I don't want to see it!"

The ORIGIN properties, apart from the fact that they imply the same things as INCLUDE, are used to define the scope rules for form-names. The rule is:

A slot application can be associated with a slot declaration iff they have the same name and there is a path of ORIGIN links from the fragment group that contains the declaration to the fragment group that contains the application

A useful intuition about this is that an ORIGIN property tells in what direction the fragment forms in this group must travel to find their ultimate destination. This intuition reaches back to the search-and-replace semantics of slots, and when several groups (files) are involved and the search-and-replace process must cross group borders, the source code which gets inserted to replace the <<SLOT ..>> expressions must travel exclusively along ORIGIN edges in the extent graph. In other words,

ORIGIN means INCLUDE, and "Home is this way!"

How to handle this?

The fact that graphs and lots of transitive closures are at the heart of the definition of the module system of gbeta (and traditional BETA as well) may make it seem counter-intuitive and unmanageable in practice. However, having worked with this system for years (as a student programmer using BETA for many tasks), I find it very intuitive and extremely expressive. Moreover, the fragment system is being taught to second year C.S. students at Århus University every year, so it should be possible to achieve a working knowledge of it in a reasonable time. Looking at concrete examples is surely a good approach.

Expanding a little on the expressiveness: It is trivial, for example, to set up a visibility-scheme which hides certain aspects of the implementation of one pattern from its sub-patterns (similar to C++ private), or allows them to depend on those aspects (similar to C++ protected), or allows certain other patterns to see some aspects (similar to the selective export mechanism in Eiffel), or allows certain sub-patterns but not others to see something (not covered in C++ or Eiffel), etcetc.

An important difference is that this scheme is much more fine-grained than the others. It is e.g. possible to allow one particular imperative from one particular method of a pattern to depend on (have access to) some subset of another pattern, possibly a super-pattern. And, equally important, since interface and implementation can be kept in different files, patterns depending on the interface of a given pattern P will not need recompilation if (only) the implementation of P changes.

Another important difference is that access is seized by the party which "depends-on," not granted by the party which is being "depended-on." This means that tool support is needed to detect whether "somebody are looking at something they shouldn't see," but on the other hand there is no need to change the code being "depended-on" when some other code wishes to "depend-on" it. The philosophy is that the vulnerable party (which breaks if the other party changes) declares the vulnerability.

Finally, we're ready to look at a complete, concrete example in the next section.

 


Signed by: eernst@cs.auc.dk. Last Modified: 3-Jul-01