[Software Carpentry logo]
[ACL Logo]

[CodeSourcery Logo]
Copyright (c) 2000 Tom Tromey. This material may be distributed only subject to the terms and conditions set forth in the Software Carpentry Open Publication License, which is available at:
http://www.software-carpentry.com/openpub-license.html


Abstract

Author: Tom Tromey <tromey@cygnus.com>

I propose a simpler and more reliable, yet more general, more powerful, and typically more efficient replacement for make, here called ``SC Build''. I draw upon my experiences writing and maintaining Automake, the GNU Makefile generator, my experiences writing vmake, a GUI-based build component for an IDE, the lessons of previous make replacements such as Jam and Odin, and the lessons learned doing toolchain development at Cygnus Solutions. In this paper I also present various examples and problems with alternate approaches, against which I believe other replacement tools should be measured.

Problems With make

The problems with make are legion. Some of them are well known, but some are more obscure, so I list all the important ones here.

make has an obscure syntax. Ordinarily, nice syntax is considered mere sugar, but in make's case the syntax is so bad that it becomes an active impediment to using make. For instance, with common implementations of make there is no way to build a target whose name contains a colon (``:''). Also, since command (``phony'') targets and file targets share a namespace, if the Makefile author follows the convention of having an ``install'' command target, then it becomes difficult to build an executable named ``install'' (this not academic but is an actual case faced by the GNU fileutils). My experience with ``vmake'' showed that this obscurity of syntax made automatically generating a Makefile an arduous task; I've often considered that the number one design flaw in vmake was that we decided to generate Makefiles rather than simply reimplement all of make's functionality in the GUI.

make does not scale very well, by which I mean that one must typically sacrifice one of portability, maintainability, or performance in order to describe large projects with make. This is well documented in ``Recursive Make Considered Harmful'' (the subtext to this document being that they recommend sacrificing portability), and the original Jam paper (there they demonstrate the typical performance loss; unpublished comments from the GNU libc maintainer corroborate this finding).

The make macro language is quite limited, even in GNU make (widely considered the beefiest make in existence). This lack of power means that some features are typically missing. For instance, good automatic dependency tracking (which is quite different from just any old automatic dependency tracking) is virtually unheard of in the make world; automake's implementation is the first reliable and complete implementation with minimal performance overhead that I know of -- and this implementation took 3 major revisions and several unintuitive hacks to implement. With make there is no way to implement finer-grained dependency tracking even if you wanted to; if your tool can tell you that a change in a central header file affected only the text of a comment, you're still forced to rebuild your entire package.

make typically does not have command dependencies. This means that in some situations, when the rule to build an object changes, you must first make clean and start over. In single-developer projects this is unimportant, but in multi-developer projects it is a critical flaw. Neither does make have tool dependencies; if the compiler changes you're likewise out of luck. Again, this is not an academic concern; in my work I work on a library (libgcj, a Java runtime) which is closely coupled to gcj, the Java front end to gcc. For me the compiler changes frequently, and I'm faced with running make clean by hand quite frequently.

make relies on timestamps for dependency analysis instead of examining file contents. While attractive as an apparent (but, as it turns out, not actual) performance enhancement, this turns out to be a bad idea. Timestamps in a network environment are typically unreliable. For instance, until we added a special sanity check to autoconf, timestamp problems on the local network were the most frequent cause for bug reports by ``configure'' users (that is, the end package builders, not the package developers). Also, the Jam authors' reports about the ``ClearMake problem'' (see the Jam paper) indicate that timestamps can be unreliable in other situations involving sophisticated source control systems. The use of timestamps also leads to ugly workarounds such as the existence of ``stamp files''. These come into play when a central header file is automatically generated (e.g., as ``config.h'' is often generated by ``configure''); the idea is to avoid modifying the header file in order to prevent make from rebuilding the entire project when the contents of the file have not actually changed. Finally, the granularity of timestamps on most Unix systems is one second; this is inadequate given the speed of today's systems.

make works best when all the files are in a single directory. Some versions of make (e.g., GNU make) have VPATH features to allow builds where the source and build trees are not identical. However, in practice, for very large trees, developers often omit intermediate dependencies and instead rely on recursive invocations, and explicit ordering of subdirectory builds, to allow easier Makefile maintenance. This is well discussed in the ``Recursive Make Considered Harmful'' paper.

make doesn't provide true accountability. This hampers the debugging of Makefiles. Many times make will mysteriously decide to rebuild some object, and the developer is left wondering whether it is worth his while to try to track through the many Makefiles in his tree in order to find the single bad dependency.

Finally, make has no way to express rules where the inputs and outputs are related in a dynamic, many-to-many way. For instance, when compiling Java code containing inner classes, you can't know precisely which .class files will be created without parsing the Java source code.

The SC Build Utility

I propose to fix the problems of make, while adding new features and simplifying the syntax, with a new program called SC Build. At the same time I propose a rough design for SC Build which is much different from the design for make, and which will allow for many interesting tools to be built upon the basic SC Build core.

Everything is an object

SC Build, like most build tools, contains at its core a dependency graph. When SC Build notices that a target is out of date with respect to one or more of its sources, SC Build runs an action to regenerate the target.

Unlike most build tools, however, SC Build does not only work on files. Instead, SC Build has a generalized naming scheme for all objects it manipulates, and, while files are intended to be its primary field of operations, SC Build can be extended to operate on any object which can be given a reliable name.

As an example of how this is useful, consider the case of Java. Suppose you want to compile your .java files to .class files, and then compile those .class files down to object files (actually, any further processing on these .class files is sufficient for our purpose). Because of the existence of inner classes and other similar Java features, it isn't possible to statically predict what .class files the compiler might produce in response to a build or a rebuild. Instead we can introduce a new object that represents the output of the particular javac invocation; this object can map to a list of files -- which need not remain constant from run to run.

Also, each SC Build name exists in a namespace. Initially there are only four namespaces: ``command'' (for what in make would be called ``phony targets''), ``target'' (any actual nameable object, for instance a file), ``action'' (a named rule which operates on sources to produce targets -- intuitively this represents a tool like a compiler), and ``parameter'' (a parameter encapsulates information used in variant builds). By default the targets the user specifies are assumed to be in the ``command'' namespace.

Separating actions from targets and commands lets us keep the primary Buildfiles declarative. This is an important property for GUI tools which will want to manipulate Buildfiles, especially since Python (as far as I can tell) has no way to serialize a function body in a human-readable way (readability of the underlying files is also a very important consideration).

Each name is an object of a particular class. SC Build provides a default set of classes, but it is possible to implement new classes and use inheritance to avoid reimplementing default behavior. All interesting behavior in SC Build is implemented via documented classes, meaning that everything from dependency computation to the implementation of the derived object cache is changeable.

Each name also has a set of attributes, which are determined by the class. For instance, all existing types of names have a set of dependencies (actions can depend on the tools used by the action, making it possible to track even changes in the compiler -- very important when developing a compiler and a library in parallel). As another example, a target might have an ``install directory'' attribute, which the ``install'' command target would use. Likewise, a ``dist'' rule could query dist-related attributes in order to decide how to create the distribution tarball.

Dependency Tracking

SC Build has a generalized notion of how to determine when an object is out of date. This feature is used to implement command dependencies (a target is out of date if its action has changed). It can also be used to implement fine-grained dependency tracking, should tools to do that become widely available (I believe they are not simply because there is no reason to write them -- make won't use the information in any case). By default, when operating on files, SC Build examines the file contents and not the timestamp (to be more precise, SC Build adopts the optimization that if neither a file's size nor its timestamp has changed, then the file is considered not to have changed).

SC Build automatically constructs full dependencies for a target; each SC Build action has an associated dependency-generating ``method'' (one attribute of an action). My experience with automake leads me to believe that generating dependencies as a side effect of compilation is the most efficient and most reliable method; SC Build has a way of expressing that this is what is occurring in a given tool invocation. Since each tool in SC Build is an object in its own right, it is easy to add custom dependency generation code in the event that the underlying tool (e.g., a lame compiler) does not support dependency generation in the way we'd like.

Since it keeps full dependencies based on file contents, including command and tool dependencies, SC Build can notice when a given action would result in no change, and thus short-circuit that segment of the dependency graph. For instance, if you modify a comment in ``config.h'' (a header included in every source file in your project), and SC Build is running with a source analyzer that understands this difference, then SC Build will realize that nothing must be done, and your build will complete without rebuilding any targets.

Keeping full dependencies based on file contents also allows SC Build to implement a derived object cache. In some cases, this will allow SC Build to bypass a rule and instead simply copy the preexisting object from its cache into the working directory. This can be used to allow builds to be shared by multiple users (for instance users could easily use a nightly build tree as a backing store for the SC Build cache), and can make it simple to switch back and forth between different build variants (assuming you prefer to work in a single build tree). This basic functionality is also what allows SC Build to run actions in parallel, either on a single machine or across the network. Like most things in SC Build, the derived object cache is configurable via loadable SC Build modules.

Separability

Project descriptions, called Buildfiles, have an important property I call ``separability''. One nice feature of the way Makefiles are typically written is that the changes are close to the source -- if you add a file to a given directory, you ordinarily edit the Makefile in the same directory (this is important when considering maintainability). To that end, a Buildfile can import a Buildfile from another directory; in this case SC Build knows to appropriately transform names referred to in the subsidiary Buildfile -- this is a form of scoping. SC Build also knows when to unify names. So, for instance, if ``Buildfile'' refers to an object ``subdir/a.c'' (this is not a filename, but rather the name of an SC Build object), and ``subdir/Buildfile'' refers to ``a.c'', SC Build knows these are the same object. SC Build automatically extends this idea to command targets, so for instance one might run ``build --command subdir/install'' to install objects from ``subdir'' only.

Build Variants

It is often necessary to build several variants of some program simultaneously. For instance, for system libraries it is useful to build optimized, debugging, and profiling versions all at once. In the gcc world it is common to ``multilib'' libraries -- build one library for every ABI supported by a particular compiler. In SC Build, these features and others are supported by build variants.

A build variant is stored in the ``parameter'' namespace. Any object which changes depending on the build variant will do this by looking up attributes of the variant object in the parameter namespace. So, for instance, a compiler tool might look up the command-line options or the compiler name itself from a parameter object; or an executable object might look up its list of constituent objects via a parameter.

The default tool and rule classes supplied with SC Build use variants to look up options and commands; if no variants are specified, then the default ``main'' variant is assumed.

Reporting and Debugging

Since SC Build objects will be mapped onto Python objects, developers can use Python's debugging capabilities to debug their Buildfiles.

This is necessary, but not sufficient for all uses of a build tool. A GUI tool wants to be able to extract even more information from a build system than a user of the command-line. For instance, a GUI tool wants to know when objects are out of date (so it can display them differently), and it wants to be able to precisely associate compiler output with compiler command lines (this can't be done on a purely textual basis, for example by parsing make's output, due to parallel builds).

To this end, SC Build will allow instrumenting of various methods to generate catchable events; event catchers will be able to use this information as appropriate.

In a similar vein, SC Build will log chains of deductions, so the user can ask why a particular rule was invoked. It will also let the user ask questions (e.g., ``What would be rebuilt if I modified this file?'') in order to more clearly understand the build process.

User Interface

The basic SC Build tool is much like make: it is an automation tool intended for use from the command-line. This makes it suitable for use in unattended builds and the like.

However, my intent is that in its fundamental design, SC Build will be an open-ended and extensible system, resembling a library more than a monolithic tool. This makes it possible to integrate with development environments in a more natural way.

For instance, in a typical GNU-ish development environment, the developer commonly makes a TAGS file; this file maps function names onto file names for easy navigation in Emacs. Developers rarely update this file frequently, however, as that involves a rather slow make step, particularly if the project is large. As another example, when writing ``vmake'' I discovered I had to keep track of all the dependency information in my tool, rather than letting make keep track of it, because (1) I wanted command-dependencies, which make didn't have, and (2) I wanted to keep the user apprised of out-of-date derived objects by displaying them in a different color. In both these situations I believe a SC Build server could solve the problem neatly; in both cases the development environment (Emacs, or the IDE) could notify the SC Build server of file modifications, and SC Build would (in the first case) update the TAGS file by running the appropriate rule (without the make-induced overhead of rescanning the Makefile) or (in the second case) communicate dependency information back to the IDE.

This second application of a SC Build server, namely information reporting, bears more examination. SC Build automatically generates reports describing why it makes the decisions it makes. Another tool can be used to display these reports, either directly or in a form suitable for publication on the web. This makes it much easier to debug Buildfiles than Makefiles.

Environment

SC Build has four important sources of input:

Outline Design

In this section I present a strawman design to demonstrate that a tool such as SC Build is in fact buildable.

SC Build is separated into three basic phases:

Parse the command-line.
This is pulled out as a separate phase to emphasize the library-like nature of SC Build. In the SC Build server, or in a GUI written around SC Build, this phase would not exist (at least, not with the implementation).
Read the Buildfiles.
This phase finds and parses all the Buildfiles. It also computes the data structure which represents the dependency graph.
Perform the requested operation.
In this phase SC Build does whatever it was that it was told to do; for instance it might install the appropriate derived objects.
In a server- or GUI-SC Build, step 2 is repeated whenever a Buildfile is modified. In these scenarios, step 3 is also repeated as required.

I believe we can all agree that step 1 is known to be implementable; the important steps are step 2, where the data structures critical to SC Build's operation are built, and step 3, where these data structures are updated as the rules are run. One approach would be to represent each name in a Buildfile as an object. Each object would have a dictionary of attributes, some taken directly from the Buildfile and some from SC Build's own database. Each dependency for a given object would be represented as a link between the object and its dependency. A target object would check its status relative to a dependency by making a method call on the dependency object. This method call would actually use double dispatch, taking into account not only the target and the dependency, but also the dependency-generating method in effect for this pair.

Likewise, when running an action, the target object would invoke a method on the action object, passing the target and the sources as parameters.

Example 1: Hello, World

This example shows the simple ``Hello, World'' application, which is built by compiling a single C source file.
from SCBuild import *
import SCBuild.C

Command ('all',
  dependencies = File ('hello',
    comment = 'Build the file hello',
    rule = SCBuild.C.Link,
    dependencies = SCBuild.C.Compile ('hello.c')))
There are many interesting things to note in this example:

Example 2: etags

Emacs includes a file, etags.c, which can be built two different ways to yield two different executables: etags and ctags. This is a relatively simple example, but it serves to eliminate simple approaches to choosing intermediate object file names.
from SCBuild import *
import SCBuild.C
import SCBuild.Autoconf

File ('etags',
  rule = SCBuild.C.Link,
  dependencies = SCBuild.C.Object ('etags.o', 'etags.c'))

File ('ctags',
  rule = SCBuild.C.Link
  dependencies = SCBuild.C.Object ('ctags.o', 'etags.c', flags = '-DCTAGS'))

Command ('all',
  dependencies = SCBuild.Autoconf.value ('all'))
In this example we see how block structuring is not required. We also see how we can use a built-in module to extract information from configure. Here we assume that configure has defined a macro named all which takes as its value the name of the executables which should be built (e.g., ``ctags'').

Example 3: Automake

In this example we show how a more complex (and ironic!) example, the top level of the GNU Automake package, could be built using SC Build.
from SCBuild import *
import SCBuild.Autoconf
import SCBuild.Install
import SCBuild.Dist

# We assign a convenience name to the collection of files, but the
# File factory actually create multiple file objects at once.
File ('data',
      files = ['clean-hdr.am', 'clean-kr.am', 'clean.am', '...more here...'],
      installdir = SCBuild.Autoconf.pkgdatadir,
      dist = 1)

File ('scripts',
      files = ['config.guess', 'config.sub', 'install-sh', '...more here...'],
      installdir = SCBuild.Autoconf.pkgdatadir,
      installmode = '+x',
      dist = 1)

SCBuild.Autoconf.ConfigFile ('executables',
                             files = ['automake', 'aclocal'],
                             installdir = SCBuild.Autoconf.bindir,
                             installmode = '+x')

Command ('all',
  subdirs = ['m4', 'tests'],
  dependencies = ['m4/all', 'tests/all', 'data', 'scripts', 'executables'])

Command ('install',
  dependencies = 'all',
  rule = SCBuild.Install.install)

Command ('dist',
  dependencies = 'all',
  rule = SCBuild.Dist.dist)

Issues

There are some minor issues facing SC Build, and one major one.

One minor implementation issue is finding an efficient method to handle derived object cache locking when faced with parallel builds across a network. A related issue is that of protecting a shared object cache against malicious users; however I see that as largely a matter of site policy to be decided by system administrators.

Another question is whether error output from failed build steps should be stored in the derived object cache. My current opinion is ``yes'', because I anticipate tool-level dependency tracking (that is, if you rebuild the compiler, SC Build should know to try to rebuild the objects).

There is a potential problem that SC Build's object unification scheme when using the separability feature will fail in some cases, particularly those involving ``..'' in filenames in the presence of symlinks. I doubt these problems are very major; either adopting a sensible standard policy or allowing for user control via plugins would probably be acceptable.

Since SC Build doesn't actually exist, it is hard to say whether or not it will require modifications in order to be used ``in anger''. The Jam authors report that Jam required modifications to be used in Sybase builds (which are quite unusual by my reading). I believe this problem will affect any proposed (as opposed to actual) replacement for make.

The biggest problems facing SC Build are not technical, but instead political. These are shared by any potential make replacement.

The first problem is convincing people to use SC Build. Many make replacements have been written, but none have seen wide adoption. Many developers eschew even the use of GNU make and instead rely on the portable subset (and from my automake experience I can tell you it is a very small subset) of make features.

The second problem is one of rollout. For instance, consider gcc development as it occurs inside Cygnus Solutions. My current source tree is approximately 6,000 files in approximately 500 directories, with about 100 Makefiles. Rewriting these Makefiles into Buildfiles, and then running tests to make sure the rewrite is correct will be quite difficult. This is especially true given that feedback from this ``Buildification'' back into SC Build will probably occur during the course of this project -- in fact, this concern suggests the adoption of an incremental model, whereby we implement an important subset of SC Build (e.g., SC Build without a working object cache), Buildify the existing source base, fix SC Build, and iterate, only coming back to finish SC Build once we are assured that we've reached a stable point. However, this approach also gives me pause, given the widely distributed and highly heterogenous nature of the gcc development community.

These two problems won't be seen in this form in the "casual developer" community, but make already serves that community quite well (here I'm talking about single-directory projects whose maintainers aren't concerned about portability or even redistributability).

Outline Plan

I believe the current best-of-breed development process for an open source tool is that pioneered by ``egcs'' (now ``gcc'' again) and followed by several other large development projects.

The basics of the plan are:

To this I would add some vital lessons learned from the Tcl community, as well as my own experiences writing automake:

One approach to solving the rollout problem would be to adopt that taken by Autoconf: once it reached a sufficient level of maturity, the maintainer at the time (David Zuhn) took it upon himself to ``autoconfiscate'' a number of projects and furthermore convince the maintainers to accept the changes in to the main line. For SC Build this task would be much more difficult, particularly since, as I noted before, maintainers of smallish packages really have no reason to change.


Appendix 1: Man Page

NAME

scbuild - automatic program build tool

SYNOPSIS

scbuild [ -f buildfile ] [ option ] ... [ target ] ...

DESCRIPTION

The scbuild utility is used to automatically derive a set of output files from a set of input files, like make. It keeps track of the relationship between these files, and will automatically rebuild the derived files when necessary.

-d
Enter the debugger.
-f buildfile
Read description from the named file. The default file name is Buildfile.
-k
Keep going even if scbuild encounters an error while building a derived object.
--namespace space
Choose targets from the indicated namespace. By default targets are assumed to be in the command namespace.
target
Build the indicated target

SEE ALSO

See make (1) to see a less functional build program.

NOTES

The scbuild command-line interface is deliberately kept simple. Other command-line options can be added by user modules; for instance, all logging options are handled this way.

Appendix 2: Thanks

I'd like to thank my reviewers: Alexandre Oliva, Jim Blandy, Zack Weinberg, Havoc Pennington, and Anthony Green. I'd also like to thank the Gnits group, plus anybody who ever complained about or contributed to Automake.


   [Home]       [FAQ]       [License]       [Rules]       [Resources]       [Archives]   

Powered by Zope

Zope management by SPVI

Last modified 2000/05/15 19:47:38.6367 US/Mountain