![]() |
|
|||
Copyright (c) 2000 Tom Tromey. This material may
be distributed only subject to the terms and conditions set forth in
the Software Carpentry Open Publication License, which is available at:
|
I propose a simpler and more reliable, yet more general, more powerful, and typically more efficient replacement for make, here called ``SC Build''. I draw upon my experiences writing and maintaining Automake, the GNU Makefile generator, my experiences writing vmake, a GUI-based build component for an IDE, the lessons of previous make replacements such as Jam and Odin, and the lessons learned doing toolchain development at Cygnus Solutions. In this paper I also present various examples and problems with alternate approaches, against which I believe other replacement tools should be measured.
make has an obscure syntax. Ordinarily, nice syntax is considered mere sugar, but in make's case the syntax is so bad that it becomes an active impediment to using make. For instance, with common implementations of make there is no way to build a target whose name contains a colon (``:''). Also, since command (``phony'') targets and file targets share a namespace, if the Makefile author follows the convention of having an ``install'' command target, then it becomes difficult to build an executable named ``install'' (this not academic but is an actual case faced by the GNU fileutils). My experience with ``vmake'' showed that this obscurity of syntax made automatically generating a Makefile an arduous task; I've often considered that the number one design flaw in vmake was that we decided to generate Makefiles rather than simply reimplement all of make's functionality in the GUI.
make does not scale very well, by which I mean that one must typically sacrifice one of portability, maintainability, or performance in order to describe large projects with make. This is well documented in ``Recursive Make Considered Harmful'' (the subtext to this document being that they recommend sacrificing portability), and the original Jam paper (there they demonstrate the typical performance loss; unpublished comments from the GNU libc maintainer corroborate this finding).
The make macro language is quite limited, even in GNU make (widely considered the beefiest make in existence). This lack of power means that some features are typically missing. For instance, good automatic dependency tracking (which is quite different from just any old automatic dependency tracking) is virtually unheard of in the make world; automake's implementation is the first reliable and complete implementation with minimal performance overhead that I know of -- and this implementation took 3 major revisions and several unintuitive hacks to implement. With make there is no way to implement finer-grained dependency tracking even if you wanted to; if your tool can tell you that a change in a central header file affected only the text of a comment, you're still forced to rebuild your entire package.
make typically does not have command dependencies. This means that in some situations, when the rule to build an object changes, you must first make clean and start over. In single-developer projects this is unimportant, but in multi-developer projects it is a critical flaw. Neither does make have tool dependencies; if the compiler changes you're likewise out of luck. Again, this is not an academic concern; in my work I work on a library (libgcj, a Java runtime) which is closely coupled to gcj, the Java front end to gcc. For me the compiler changes frequently, and I'm faced with running make clean by hand quite frequently.
make relies on timestamps for dependency analysis instead of examining file contents. While attractive as an apparent (but, as it turns out, not actual) performance enhancement, this turns out to be a bad idea. Timestamps in a network environment are typically unreliable. For instance, until we added a special sanity check to autoconf, timestamp problems on the local network were the most frequent cause for bug reports by ``configure'' users (that is, the end package builders, not the package developers). Also, the Jam authors' reports about the ``ClearMake problem'' (see the Jam paper) indicate that timestamps can be unreliable in other situations involving sophisticated source control systems. The use of timestamps also leads to ugly workarounds such as the existence of ``stamp files''. These come into play when a central header file is automatically generated (e.g., as ``config.h'' is often generated by ``configure''); the idea is to avoid modifying the header file in order to prevent make from rebuilding the entire project when the contents of the file have not actually changed. Finally, the granularity of timestamps on most Unix systems is one second; this is inadequate given the speed of today's systems.
make works best when all the files are in a single directory. Some versions of make (e.g., GNU make) have VPATH features to allow builds where the source and build trees are not identical. However, in practice, for very large trees, developers often omit intermediate dependencies and instead rely on recursive invocations, and explicit ordering of subdirectory builds, to allow easier Makefile maintenance. This is well discussed in the ``Recursive Make Considered Harmful'' paper.
make doesn't provide true accountability. This hampers the debugging of Makefiles. Many times make will mysteriously decide to rebuild some object, and the developer is left wondering whether it is worth his while to try to track through the many Makefiles in his tree in order to find the single bad dependency.
Finally, make has no way to express rules where the inputs and outputs are related in a dynamic, many-to-many way. For instance, when compiling Java code containing inner classes, you can't know precisely which .class files will be created without parsing the Java source code.
Unlike most build tools, however, SC Build does not only work on files. Instead, SC Build has a generalized naming scheme for all objects it manipulates, and, while files are intended to be its primary field of operations, SC Build can be extended to operate on any object which can be given a reliable name.
As an example of how this is useful, consider the case of Java. Suppose you want to compile your .java files to .class files, and then compile those .class files down to object files (actually, any further processing on these .class files is sufficient for our purpose). Because of the existence of inner classes and other similar Java features, it isn't possible to statically predict what .class files the compiler might produce in response to a build or a rebuild. Instead we can introduce a new object that represents the output of the particular javac invocation; this object can map to a list of files -- which need not remain constant from run to run.
Also, each SC Build name exists in a namespace. Initially there are only four namespaces: ``command'' (for what in make would be called ``phony targets''), ``target'' (any actual nameable object, for instance a file), ``action'' (a named rule which operates on sources to produce targets -- intuitively this represents a tool like a compiler), and ``parameter'' (a parameter encapsulates information used in variant builds). By default the targets the user specifies are assumed to be in the ``command'' namespace.
Separating actions from targets and commands lets us keep the primary Buildfiles declarative. This is an important property for GUI tools which will want to manipulate Buildfiles, especially since Python (as far as I can tell) has no way to serialize a function body in a human-readable way (readability of the underlying files is also a very important consideration).
Each name is an object of a particular class. SC Build provides a default set of classes, but it is possible to implement new classes and use inheritance to avoid reimplementing default behavior. All interesting behavior in SC Build is implemented via documented classes, meaning that everything from dependency computation to the implementation of the derived object cache is changeable.
Each name also has a set of attributes, which are determined by the class. For instance, all existing types of names have a set of dependencies (actions can depend on the tools used by the action, making it possible to track even changes in the compiler -- very important when developing a compiler and a library in parallel). As another example, a target might have an ``install directory'' attribute, which the ``install'' command target would use. Likewise, a ``dist'' rule could query dist-related attributes in order to decide how to create the distribution tarball.
SC Build automatically constructs full dependencies for a target; each SC Build action has an associated dependency-generating ``method'' (one attribute of an action). My experience with automake leads me to believe that generating dependencies as a side effect of compilation is the most efficient and most reliable method; SC Build has a way of expressing that this is what is occurring in a given tool invocation. Since each tool in SC Build is an object in its own right, it is easy to add custom dependency generation code in the event that the underlying tool (e.g., a lame compiler) does not support dependency generation in the way we'd like.
Since it keeps full dependencies based on file contents, including command and tool dependencies, SC Build can notice when a given action would result in no change, and thus short-circuit that segment of the dependency graph. For instance, if you modify a comment in ``config.h'' (a header included in every source file in your project), and SC Build is running with a source analyzer that understands this difference, then SC Build will realize that nothing must be done, and your build will complete without rebuilding any targets.
Keeping full dependencies based on file contents also allows SC Build to implement a derived object cache. In some cases, this will allow SC Build to bypass a rule and instead simply copy the preexisting object from its cache into the working directory. This can be used to allow builds to be shared by multiple users (for instance users could easily use a nightly build tree as a backing store for the SC Build cache), and can make it simple to switch back and forth between different build variants (assuming you prefer to work in a single build tree). This basic functionality is also what allows SC Build to run actions in parallel, either on a single machine or across the network. Like most things in SC Build, the derived object cache is configurable via loadable SC Build modules.
A build variant is stored in the ``parameter'' namespace. Any object which changes depending on the build variant will do this by looking up attributes of the variant object in the parameter namespace. So, for instance, a compiler tool might look up the command-line options or the compiler name itself from a parameter object; or an executable object might look up its list of constituent objects via a parameter.
The default tool and rule classes supplied with SC Build use variants to look up options and commands; if no variants are specified, then the default ``main'' variant is assumed.
This is necessary, but not sufficient for all uses of a build tool. A GUI tool wants to be able to extract even more information from a build system than a user of the command-line. For instance, a GUI tool wants to know when objects are out of date (so it can display them differently), and it wants to be able to precisely associate compiler output with compiler command lines (this can't be done on a purely textual basis, for example by parsing make's output, due to parallel builds).
To this end, SC Build will allow instrumenting of various methods to generate catchable events; event catchers will be able to use this information as appropriate.
In a similar vein, SC Build will log chains of deductions, so the user can ask why a particular rule was invoked. It will also let the user ask questions (e.g., ``What would be rebuilt if I modified this file?'') in order to more clearly understand the build process.
However, my intent is that in its fundamental design, SC Build will be an open-ended and extensible system, resembling a library more than a monolithic tool. This makes it possible to integrate with development environments in a more natural way.
For instance, in a typical GNU-ish development environment, the developer commonly makes a TAGS file; this file maps function names onto file names for easy navigation in Emacs. Developers rarely update this file frequently, however, as that involves a rather slow make step, particularly if the project is large. As another example, when writing ``vmake'' I discovered I had to keep track of all the dependency information in my tool, rather than letting make keep track of it, because (1) I wanted command-dependencies, which make didn't have, and (2) I wanted to keep the user apprised of out-of-date derived objects by displaying them in a different color. In both these situations I believe a SC Build server could solve the problem neatly; in both cases the development environment (Emacs, or the IDE) could notify the SC Build server of file modifications, and SC Build would (in the first case) update the TAGS file by running the appropriate rule (without the make-induced overhead of rescanning the Makefile) or (in the second case) communicate dependency information back to the IDE.
This second application of a SC Build server, namely information reporting, bears more examination. SC Build automatically generates reports describing why it makes the decisions it makes. Another tool can be used to display these reports, either directly or in a form suitable for publication on the web. This makes it much easier to debug Buildfiles than Makefiles.
SC Build is separated into three basic phases:
I believe we can all agree that step 1 is known to be implementable; the important steps are step 2, where the data structures critical to SC Build's operation are built, and step 3, where these data structures are updated as the rules are run. One approach would be to represent each name in a Buildfile as an object. Each object would have a dictionary of attributes, some taken directly from the Buildfile and some from SC Build's own database. Each dependency for a given object would be represented as a link between the object and its dependency. A target object would check its status relative to a dependency by making a method call on the dependency object. This method call would actually use double dispatch, taking into account not only the target and the dependency, but also the dependency-generating method in effect for this pair.
Likewise, when running an action, the target object would invoke a method on the action object, passing the target and the sources as parameters.
from SCBuild import * import SCBuild.C Command ('all', dependencies = File ('hello', comment = 'Build the file hello', rule = SCBuild.C.Link, dependencies = SCBuild.C.Compile ('hello.c')))There are many interesting things to note in this example:
from SCBuild import * import SCBuild.C import SCBuild.Autoconf File ('etags', rule = SCBuild.C.Link, dependencies = SCBuild.C.Object ('etags.o', 'etags.c')) File ('ctags', rule = SCBuild.C.Link dependencies = SCBuild.C.Object ('ctags.o', 'etags.c', flags = '-DCTAGS')) Command ('all', dependencies = SCBuild.Autoconf.value ('all'))In this example we see how block structuring is not required. We also see how we can use a built-in module to extract information from configure. Here we assume that configure has defined a macro named all which takes as its value the name of the executables which should be built (e.g., ``ctags'').
from SCBuild import * import SCBuild.Autoconf import SCBuild.Install import SCBuild.Dist # We assign a convenience name to the collection of files, but the # File factory actually create multiple file objects at once. File ('data', files = ['clean-hdr.am', 'clean-kr.am', 'clean.am', '...more here...'], installdir = SCBuild.Autoconf.pkgdatadir, dist = 1) File ('scripts', files = ['config.guess', 'config.sub', 'install-sh', '...more here...'], installdir = SCBuild.Autoconf.pkgdatadir, installmode = '+x', dist = 1) SCBuild.Autoconf.ConfigFile ('executables', files = ['automake', 'aclocal'], installdir = SCBuild.Autoconf.bindir, installmode = '+x') Command ('all', subdirs = ['m4', 'tests'], dependencies = ['m4/all', 'tests/all', 'data', 'scripts', 'executables']) Command ('install', dependencies = 'all', rule = SCBuild.Install.install) Command ('dist', dependencies = 'all', rule = SCBuild.Dist.dist)
One minor implementation issue is finding an efficient method to handle derived object cache locking when faced with parallel builds across a network. A related issue is that of protecting a shared object cache against malicious users; however I see that as largely a matter of site policy to be decided by system administrators.
Another question is whether error output from failed build steps should be stored in the derived object cache. My current opinion is ``yes'', because I anticipate tool-level dependency tracking (that is, if you rebuild the compiler, SC Build should know to try to rebuild the objects).
There is a potential problem that SC Build's object unification scheme when using the separability feature will fail in some cases, particularly those involving ``..'' in filenames in the presence of symlinks. I doubt these problems are very major; either adopting a sensible standard policy or allowing for user control via plugins would probably be acceptable.
Since SC Build doesn't actually exist, it is hard to say whether or not it will require modifications in order to be used ``in anger''. The Jam authors report that Jam required modifications to be used in Sybase builds (which are quite unusual by my reading). I believe this problem will affect any proposed (as opposed to actual) replacement for make.
The biggest problems facing SC Build are not technical, but instead political. These are shared by any potential make replacement.
The first problem is convincing people to use SC Build. Many make replacements have been written, but none have seen wide adoption. Many developers eschew even the use of GNU make and instead rely on the portable subset (and from my automake experience I can tell you it is a very small subset) of make features.
The second problem is one of rollout. For instance, consider gcc development as it occurs inside Cygnus Solutions. My current source tree is approximately 6,000 files in approximately 500 directories, with about 100 Makefiles. Rewriting these Makefiles into Buildfiles, and then running tests to make sure the rewrite is correct will be quite difficult. This is especially true given that feedback from this ``Buildification'' back into SC Build will probably occur during the course of this project -- in fact, this concern suggests the adoption of an incremental model, whereby we implement an important subset of SC Build (e.g., SC Build without a working object cache), Buildify the existing source base, fix SC Build, and iterate, only coming back to finish SC Build once we are assured that we've reached a stable point. However, this approach also gives me pause, given the widely distributed and highly heterogenous nature of the gcc development community.
These two problems won't be seen in this form in the "casual developer" community, but make already serves that community quite well (here I'm talking about single-directory projects whose maintainers aren't concerned about portability or even redistributability).
The basics of the plan are:
[Home] | [FAQ] | [License] | [Rules] | [Resources] | [Archives] |