Cmm is a command-line programme that adds support for multimethods to C++. It does this by working as a translator for a C++ language extension, reading in Cmm source code and outputing C++ source code.
Cmm also implements some other C++ extensions - see otherextensions.html for more details.
This file is for users of Cmm. If you want to know how to build Cmm, or are interested in how the Cmm programme works or in the details of the dynamic dispatch code that is generated, please see the Links section below.
The multimethod support uses approximately the syntax suggested in
Stroustrup's Design and Evolution of C++ (page 298). A function
prototype is a multimethod function if one or more of its parameters are
qualified with the keyword virtual
. Implementations of a
multimethod function have the same name plus a trailing underscore, and have
static
in place of virtual
qualifiers.
The parameters that correspond to the virtual parameters in the virtual
function prototype must be derived from the original types, and have the same
modifiers (such as const
).
Virtual function declarations must occur before implementation function definitions/declarations. Otherwise the implementation functions will be treated as conventional functions.
An example should make this clearer. We'll use the classic multimethods
example, an Overlap
function that takes references to a
Shape
base class:
struct Shape {...};
struct Square : Shape {...};
struct Triangle : Shape {...};
bool Overlap( virtual Shape& a, virtual Shape& b);
bool Overlap_( static Square& a, static Triangle& b) {...}
bool Overlap_( static Triangle& a, static Square& b) {...}
bool Overlap_( static Shape& a, static Square& b) {...}
bool Overlap_( static Square& a, static Shape& b) {...}
The foo( virtual Shape& a, virtual Shape& b);
prototype is replaced by a dispatch function with prototype foo(
Shape& a, Shape& b);
. This dispatch function uses C++ RTTI to
choose one of the foo_()
functions, based on the dynamic types
of its parameters.
It is possible that there isn't any matching implementation function for a particular combination of dynamic types. In this case, the generated dispatch function will throw an exception.
It will also throw an exception if there is no clear best implementation for a particular combination of dynamic types. An implementation is considered best if the following 2 conditions apply:
Note that we cannot have duplicate implementations, so the second condition implies that for each other matching implementation X, the best implementation must have at least one parameter type that is more derived than the corresponding parameter type in X.
This is the similar to the lookup rule used by C++ at compile time (apart from C++'s use of conversion operators and special casing of built-in types).
Thus:
Shape& s = *new Square;
Shape& t = *new Triangle;
Overlap( t, t); // Throws - no matching implementation
Overlap( s, t); // Calls Overlap_( static Square&, static Triangle&)
Overlap( t, s); // Calls Overlap_( static Triangle&, static Square&)
Overlap( s, s); // Throws - no best matching implementation
The trailing underscore in implementation function names is ugly, but it allows one to define an implementation that takes the base parameters, which would otherwise clash with the generated dispatch function.
A Cmm file is converted to a C++ file with:
cmm -s <Cmm input filename> <C++ output filename>
For things to work correctly however, the the input C++ file needs to have been pre-processed with the standard C++ preprocessor.
Cmm will insert calls to functions declared in cmm/dispatch.h (which is fairly well commented), so when a programme is built these functions must be available, which is most easily done by adding the file cmm/dispatch.cpp to the programme's build. This file contains implementation of all of the functions defined in cmm/dispatch.h.
To build the following example programmes, see building.html.
The directory examples/overlap/ contains a simple single-file Cmm programme.
The directory examples/overlap2/ is basically the same programme, but split up into several Cmm source files and header files, to demonstrate how Cmm copes with multiple compilation units.
The directory examples/speed/ contains a programme that measures the speed of the virtual function dispatch code that Cmm generates.
cmm -h gives help. This help is also available in the file cmmhelp.text.
building.html contains instructions for building Cmm.
readme.html contains release notes.
cmm/readme.html has a description of the design and implementation of the Cmm programme.
http://www.functionalobjects.com/resources/index.phtml contains lots of information about the Dlyan language, which has built-in support for multimethods.
http://homepage.ntlworld.com/w.weston/multiple_dispatch.html - Bill Weston's within-the-language multimethods system for C++, using templates.
See building.html.
For each virtual function, Cmm also generates an extra function with the
same name but with a _cmm_getimpl
suffix. This function takes
just the virtual parameters, and returns a pointer to the implementation
function that would be called if the real virtual function was called with
these parameters.
This is intended to allow time-critical code that calls a virtual function repeatedly with the same virtual parameters, to do the lookup just once and cache a pointer to the implementation.
For example:
void foo( virtual base&);
...
void my_function( base& b)
{
for ( int i=0; i<1000; ++i)
{
foo( b); // slow, looks up dynamic type of b.
}
typedef void (*implfn)( base&);
implfn fn = (implfn) foo_cmm_getimpl( b);
for ( int i=0; i<1000; ++i)
{
fn( b); // fast, no dispatch overhead.
}
}
The code generated by Cmm is intended to work correctly even when code
containing multimethod implementations in shared libraries is
loaded/unloaded dynamically. It certainly works ok with the
dlopen()
and dlclose()
functions in OpenBSD 3.3 and
various other Unix-like operating systems.
Cmm code registers multimethod implementations by defining a global
instance of the class cmm_implementation_holder
(defined in cmm/dispatch.h and cmm/dispatch.cpp). This class's constructor
registeres the implementation, and its destructor unregisteres the
implementation. This means that you can load/unload shared code containing
implementations of multimethods, and everything will work correctly (assuming
that the code loader/unloader correctly constructs/destructs global
objects).
The default dispatch code in cmm/dispatch.h
and cmm/dispatch.cpp that is called by the
code generated by cmm -s uses a cache to help improve the speed of
dispatches. This cache is a std::map< std::vector<
std::type_info>, cmm_fnptr>
(actually is uses Andrei
Alexandrescu's wrapper for std::type_info
to make it work inside
the std::vector
). This means that dispatch time will be O(
Log N), where N is the number of items in this cache, i.e. the
number of different combinations of dynamic types seen for a particular
multimethod.
A preliminary constant-time dispatch implementation is also available in
cmm/dispatch.cpp. This requires that classes
have unique small integers associated with them, and uses these small
integers to index arrays to give the constant lookup time. Providing unique
small integers and using them in dispatch is done by passing the
-small-ints option to Cmm. Cmm will insert an extra inline virtual
function into all classes that already have at least one virtual function
(except for those in namespace std
). This extra virtual function
contains a local static int, which is set by calling the function
cmm_create_small_integer( std::type_info&)
(see cmm/dispatch.h).
It's rather poor style to use inline virtual function bodies in this way. The different instances of the inline virtual functions in different compilation units should end up being one function in the final executable, but this doesn't happen on some systems such as gcc 2.95.3 and gcc 3.2 on OpenBSD. However, other systems such as cygwin seem to get this right. Cmm's generated code works even with the broken behaviour.
While the dispatch will be constant time, this constant time involves calling one conventional virtual function per virtual parameter. Some very simple tests suggest that the constant time is something like 10 times the speed of a conventional virtual function call (running make test100 top_build=gcc-release will generate timing statistics for the various different dispatch methods; the speedtest-cmm-ss result uses cmm -small-ints).
Ideally the unique small integer would be embedded directly in the vtable so that it could be accessed directly. This would make everything work perfectly, but of course needs to be implemented in the compiler. Without this sort of support, cmm -small-ints is the best I can think of, but it has the following limitations:
#include
<typeinfo>
. This is so that the inline virtual function
inserted by Cmm (which uses std::type_info
) will compile
correctly. Cmm can't safely insert the #include
itself
because it is only involved after preprocessing; inserting it can cause
nasty problems because it may already have been included and expanded by
the preprocessor before being passed to Cmm.
The best solution would probably be to alway insert #include
<typeinfo>
before preprocessing a cmm file, but Cmm has no
control over this.
std
namespace, so Cmm's
-small-ints flag will probably not work properly. However it
works fine with gcc 3.2.
There are also problems when using STLport, because Cmm's name lookup
doesn't handle STLport's use of using namespace _STL_STD
constructs, causing Cmm to try to add a virtual function to
std
classes.
In general, inserting extra virtual methods only works when it is done consistently for all compilation units, so if it is done for classes that are used in pre-compiled libraries, it will give undefined behaviour. (you might be lucky because cmm inserts its virtual method at the end of the class, so the vtable layout might be unchanged for the other virtual functions.)
Cmm supports dispatch on templated types such as smart pointers. For example:
void foo( shared_ptr< virtual base> x);
void foo_( shared_ptr< static derived> x) {...}
This requires three function templates to be provided that behave
similarly to dynamic_cast
, static_cast
and
typeid
, except that they know about particular class templates
and operate on what the class templates are handles for, rather than the
class templates themslves:
cmm_dynamic_test
that is similar to
dynamic_cast
, except that it acts on the instantiation type
rather than the class template. The return value is only evaluated as
true/false, so it can return just a true/false bool
rather
than a full NULL/non-NULL pointer like dynamic_cast
.cmm_static_cast
that is similar to
static_cast
, except that it casts from one instantiation of
a class template to a different instantiation of the same class
template.cmm_typeid
that is similar to
typeid
except that it takes an instance of an instantiation
of a class template and returns the std::type_info
of the
tempate parameter.An example of how this works is examples/template-virtual.cmm.
When static
is used inside the declaration of a function
parameter type, it identifies the class that that parameter should be treated
as when performing dispatch. This means that you can write a whole set of
implementations that take a smart pointer to various derived types, and the
dispatch behaviour will be the same as if you had used plain references
instead of smart pointers.
For example, to make the Boost shared_ptr
work with Cmm, you
would use the following definitions of the three function templates:
#include "boost/smart_ptr.hpp"
template< class T>
const std::type_info&
cmm_typeid( boost::shared_ptr< T> x)
{
return typeid( *x);
}
template< class derived, class base>
bool
cmm_dynamic_test( boost::shared_ptr< base> x)
{
return boost::shared_dynamic_cast< derived>( x).get()
? true : false;
}
template< class derived, class base>
boost::shared_ptr< derived>
cmm_static_cast( boost::shared_ptr< base> x)
{
return boost::shared_static_cast< derived>( x);
}
These template specialisations have to be seen before each multimethod implementation.
Cmm supports an additional way of getting dynamic dispatch: instead of
multimethods being declared prior to use, the function call syntax is
extended to allow virtual
prefixes on parameters. Runtime
dispatch is performed by searching all functions in the programme for the one
that has the same name and best matches the dynamic types of all parameters
prefixed with virtual
and the static types of the other
parameters. The desired return type is specified by prefixing the function
call with virtual
followed by the type, e.g. virtual
void*
would restrict the lookup to those functions that return
void*
; if the desired return type is ommitted, void
is assumed.
To support all of this, the dispatcher needs to know about the inheritance
of any class given just its std::type_info
, and also know about
all functions that are available in the entire programme. Cmm does this by
generating code that registers all function definitions and class definitions
at runtime with a central registry (in cmm/dispatch.h and cmm/dispatch.cpp).
An example programme is:
struct base {...};
struct derived : base {...};
struct unrelated {};
void foo( base&, base&) {...}
void foo( base&, derived&) {...}
void foo( derived&, base&) {...}
int main()
{
base& b = *new base;
base& d = *new derived;
unrelated& u = *new unrelated;
0, foo( virtual b, virtual b); // calls void foo( base&, base&)
0, foo( virtual d, virtual d); // throws cmm_ambiguous
int x = virtual int foo( virtual b, virtual b);
// throws cmm_unmatched, because a foo returning int is required.
(void) foo( u); // throws cmm_unmatched.
return 0;
}
A more detailed example is in the file examples/caller-dispatch.cmm, which is built and run by test107.
Cmm's parser doesn't recognise expressions if they look like declarations
(this requires full name lookup), so if the function's return value isn't
used, it is necessary to force cmm to treat the function call as part of an
expression, using the 0,
prefix.
As an example, the generated code for 0, foo( virtual d, virtual
d);
is:
0,
(( void (*)( base& d, base& d ))
cmm_lookup(
"foo",
&typeid( void ),
&typeid( d),
&typeid( d),
0))
( d, d);
This uses the cmm_lookup()
function to find the best matching
function called foo
that returns void
, and takes
two parameter both with type reference to typeid( d)
.
cmm_lookup()
returns a generic function pointer void
(*)()
, so it is casted to the real type, which takes base
parameters.
Every function definition is augmented with a wrapper function that takes
base parameters, and a global integer that registers this wrapper function
with cmm_lookup()
. It is the wrapper function that wil be
returned by cmm_lookup()
.
The extra code that is generated for a function int foo(
derived& a, derived& b) {...}
looks like:
int cmm_veneer_xyz( base& a, base& b)
{
return foo( static_cast< derived&>( a), static_cast< derived&( b));
}
static int
cmm_veneer_xyz_registration
= cmm_register_fn(
"foo",
(cmm_fnptr) cmm_veneer_xyz,
&typeid( void),
&typeid( derived&),
&typeid( derived&),
0);
Caller-virtual functionality is added to the normal cmm -s
behaviour using the -caller-dispatch
flag. The
-detailedparse
flag is also required.
The implementation in dispatch.cpp uses a std::map
cache,
mapping from a structure encoding the function name, return type and
parameter types, to a function pointer. It would be possible to get constant
time dispatch speed by using small integers inside
std::type_info
and having a local static variable in the calling
function that is initialised to select the function name.