Cmm (C++ with MultiMethods) user documentation

1 Overview

Cmm is a command-line programme that adds support for multimethods to C++. It does this by working as a translator for a C++ language extension, reading in Cmm source code and outputing C++ source code.

Cmm also implements some other C++ extensions - see otherextensions.html for more details.

This file is for users of Cmm. If you want to know how to build Cmm, or are interested in how the Cmm programme works or in the details of the dynamic dispatch code that is generated, please see the Links section below.

2 The Cmm C++ language extension

The multimethod support uses approximately the syntax suggested in Stroustrup's Design and Evolution of C++ (page 298). A function prototype is a multimethod function if one or more of its parameters are qualified with the keyword virtual. Implementations of a multimethod function have the same name plus a trailing underscore, and have static in place of virtual qualifiers.

The parameters that correspond to the virtual parameters in the virtual function prototype must be derived from the original types, and have the same modifiers (such as const).

Virtual function declarations must occur before implementation function definitions/declarations. Otherwise the implementation functions will be treated as conventional functions.

An example should make this clearer. We'll use the classic multimethods example, an Overlap function that takes references to a Shape base class:

struct  Shape            {...};
struct  Square : Shape   {...};
struct  Triangle : Shape {...};

bool    Overlap( virtual Shape& a, virtual Shape& b);
    
bool    Overlap_( static Square&   a, static Triangle& b) {...}
bool    Overlap_( static Triangle& a, static Square&   b) {...}
bool    Overlap_( static Shape&    a, static Square&   b) {...}
bool    Overlap_( static Square&   a, static Shape&    b) {...}

The foo( virtual Shape& a, virtual Shape& b); prototype is replaced by a dispatch function with prototype foo( Shape& a, Shape& b);. This dispatch function uses C++ RTTI to choose one of the foo_() functions, based on the dynamic types of its parameters.

It is possible that there isn't any matching implementation function for a particular combination of dynamic types. In this case, the generated dispatch function will throw an exception.

It will also throw an exception if there is no clear best implementation for a particular combination of dynamic types. An implementation is considered best if the following 2 conditions apply:

  1. All of the best implementation's parameter types match the dynamic types.
  2. Each of the best implementation's parameter types is at least as derived as the corresponding parameter type in any other matching implementation.

Note that we cannot have duplicate implementations, so the second condition implies that for each other matching implementation X, the best implementation must have at least one parameter type that is more derived than the corresponding parameter type in X.

This is the similar to the lookup rule used by C++ at compile time (apart from C++'s use of conversion operators and special casing of built-in types).

Thus:

Shape&  s = *new Square;
Shape&  t = *new Triangle;
Overlap( t, t); // Throws - no matching implementation
Overlap( s, t); // Calls Overlap_( static Square&, static Triangle&)
Overlap( t, s); // Calls Overlap_( static Triangle&, static Square&)
Overlap( s, s); // Throws - no best matching implementation

The trailing underscore in implementation function names is ugly, but it allows one to define an implementation that takes the base parameters, which would otherwise clash with the generated dispatch function.

3 Building with Cmm

A Cmm file is converted to a C++ file with:

cmm -s <Cmm input filename> <C++ output filename>

For things to work correctly however, the the input C++ file needs to have been pre-processed with the standard C++ preprocessor.

Cmm will insert calls to functions declared in cmm/dispatch.h (which is fairly well commented), so when a programme is built these functions must be available, which is most easily done by adding the file cmm/dispatch.cpp to the programme's build. This file contains implementation of all of the functions defined in cmm/dispatch.h.

4 Example programmes

To build the following example programmes, see building.html.

The directory examples/overlap/ contains a simple single-file Cmm programme.

The directory examples/overlap2/ is basically the same programme, but split up into several Cmm source files and header files, to demonstrate how Cmm copes with multiple compilation units.

The directory examples/speed/ contains a programme that measures the speed of the virtual function dispatch code that Cmm generates.

5 Cmm parameters

cmm -h gives help. This help is also available in the file cmmhelp.text.

6 Links

building.html contains instructions for building Cmm.

readme.html contains release notes.

cmm/readme.html has a description of the design and implementation of the Cmm programme.

http://www.functionalobjects.com/resources/index.phtml contains lots of information about the Dlyan language, which has built-in support for multimethods.

http://homepage.ntlworld.com/w.weston/multiple_dispatch.html - Bill Weston's within-the-language multimethods system for C++, using templates.

7 System requirements

See building.html.

8 Limitations of the current Cmm system, and to-do things

9 Advanced usage

9.1 Getting raw implementation function pointer

For each virtual function, Cmm also generates an extra function with the same name but with a _cmm_getimpl suffix. This function takes just the virtual parameters, and returns a pointer to the implementation function that would be called if the real virtual function was called with these parameters.

This is intended to allow time-critical code that calls a virtual function repeatedly with the same virtual parameters, to do the lookup just once and cache a pointer to the implementation.

For example:

void foo( virtual base&);
...
void my_function( base& b)
{
    for ( int i=0; i<1000; ++i)
    {
        foo( b);    // slow, looks up dynamic type of b.
    }
    
    typedef void (*implfn)( base&);
    implfn  fn = (implfn) foo_cmm_getimpl( b);
    for ( int i=0; i<1000; ++i)
    {
        fn( b); // fast, no dispatch overhead.
    }
}

9.2 Use with dynamic libraries

The code generated by Cmm is intended to work correctly even when code containing multimethod implementations in shared libraries is loaded/unloaded dynamically. It certainly works ok with the dlopen() and dlclose() functions in OpenBSD 3.3 and various other Unix-like operating systems.

Cmm code registers multimethod implementations by defining a global instance of the class cmm_implementation_holder (defined in cmm/dispatch.h and cmm/dispatch.cpp). This class's constructor registeres the implementation, and its destructor unregisteres the implementation. This means that you can load/unload shared code containing implementations of multimethods, and everything will work correctly (assuming that the code loader/unloader correctly constructs/destructs global objects).

9.3 Experimental constant-time dispatch speed

The default dispatch code in cmm/dispatch.h and cmm/dispatch.cpp that is called by the code generated by cmm -s uses a cache to help improve the speed of dispatches. This cache is a std::map< std::vector< std::type_info>, cmm_fnptr> (actually is uses Andrei Alexandrescu's wrapper for std::type_info to make it work inside the std::vector). This means that dispatch time will be O( Log N), where N is the number of items in this cache, i.e. the number of different combinations of dynamic types seen for a particular multimethod.

A preliminary constant-time dispatch implementation is also available in cmm/dispatch.cpp. This requires that classes have unique small integers associated with them, and uses these small integers to index arrays to give the constant lookup time. Providing unique small integers and using them in dispatch is done by passing the -small-ints option to Cmm. Cmm will insert an extra inline virtual function into all classes that already have at least one virtual function (except for those in namespace std). This extra virtual function contains a local static int, which is set by calling the function cmm_create_small_integer( std::type_info&) (see cmm/dispatch.h).

It's rather poor style to use inline virtual function bodies in this way. The different instances of the inline virtual functions in different compilation units should end up being one function in the final executable, but this doesn't happen on some systems such as gcc 2.95.3 and gcc 3.2 on OpenBSD. However, other systems such as cygwin seem to get this right. Cmm's generated code works even with the broken behaviour.

While the dispatch will be constant time, this constant time involves calling one conventional virtual function per virtual parameter. Some very simple tests suggest that the constant time is something like 10 times the speed of a conventional virtual function call (running make test100 top_build=gcc-release will generate timing statistics for the various different dispatch methods; the speedtest-cmm-ss result uses cmm -small-ints).

Ideally the unique small integer would be embedded directly in the vtable so that it could be accessed directly. This would make everything work perfectly, but of course needs to be implemented in the compiler. Without this sort of support, cmm -small-ints is the best I can think of, but it has the following limitations:

9.3.1 Caveats with cmm -small-ints

9.4 Use with templated virtual parameters

Cmm supports dispatch on templated types such as smart pointers. For example:

void foo( shared_ptr< virtual base> x);
void foo_( shared_ptr< static derived> x) {...}

This requires three function templates to be provided that behave similarly to dynamic_cast, static_cast and typeid, except that they know about particular class templates and operate on what the class templates are handles for, rather than the class templates themslves:

  1. A function template cmm_dynamic_test that is similar to dynamic_cast, except that it acts on the instantiation type rather than the class template. The return value is only evaluated as true/false, so it can return just a true/false bool rather than a full NULL/non-NULL pointer like dynamic_cast.
  2. A function template cmm_static_cast that is similar to static_cast, except that it casts from one instantiation of a class template to a different instantiation of the same class template.
  3. A function cmm_typeid that is similar to typeid except that it takes an instance of an instantiation of a class template and returns the std::type_info of the tempate parameter.

An example of how this works is examples/template-virtual.cmm.

When static is used inside the declaration of a function parameter type, it identifies the class that that parameter should be treated as when performing dispatch. This means that you can write a whole set of implementations that take a smart pointer to various derived types, and the dispatch behaviour will be the same as if you had used plain references instead of smart pointers.

For example, to make the Boost shared_ptr work with Cmm, you would use the following definitions of the three function templates:

#include "boost/smart_ptr.hpp"

template< class T>
    const std::type_info&
        cmm_typeid( boost::shared_ptr< T> x)
        {
            return typeid( *x);
        }
template< class derived, class base>
    bool
        cmm_dynamic_test( boost::shared_ptr< base> x)
        {
            return boost::shared_dynamic_cast< derived>( x).get()
                ? true : false;
        }
template< class derived, class base>
    boost::shared_ptr< derived>
        cmm_static_cast( boost::shared_ptr< base> x)
        {
            return boost::shared_static_cast< derived>( x);
        }

These template specialisations have to be seen before each multimethod implementation.

10 Caller virtual dispatch (experimental)

Cmm supports an additional way of getting dynamic dispatch: instead of multimethods being declared prior to use, the function call syntax is extended to allow virtual prefixes on parameters. Runtime dispatch is performed by searching all functions in the programme for the one that has the same name and best matches the dynamic types of all parameters prefixed with virtual and the static types of the other parameters. The desired return type is specified by prefixing the function call with virtual followed by the type, e.g. virtual void* would restrict the lookup to those functions that return void*; if the desired return type is ommitted, void is assumed.

To support all of this, the dispatcher needs to know about the inheritance of any class given just its std::type_info , and also know about all functions that are available in the entire programme. Cmm does this by generating code that registers all function definitions and class definitions at runtime with a central registry (in cmm/dispatch.h and cmm/dispatch.cpp).

An example programme is:

struct base {...};
struct derived : base {...};
struct unrelated {};

void foo( base&, base&) {...}
void foo( base&, derived&) {...}
void foo( derived&, base&) {...}

int main()
{
    base&      b = *new base;
    base&      d = *new derived;
    unrelated& u = *new unrelated;

    0, foo( virtual b, virtual b); // calls void foo( base&, base&)
    0, foo( virtual d, virtual d); // throws cmm_ambiguous
    
    int x = virtual int foo( virtual b, virtual b);
    // throws cmm_unmatched, because a foo returning int is required.

    (void) foo( u); // throws cmm_unmatched.

    return 0;
}

A more detailed example is in the file examples/caller-dispatch.cmm, which is built and run by test107.

Cmm's parser doesn't recognise expressions if they look like declarations (this requires full name lookup), so if the function's return value isn't used, it is necessary to force cmm to treat the function call as part of an expression, using the 0, prefix.

As an example, the generated code for 0, foo( virtual d, virtual d); is:

0,
    (( void (*)( base& d, base& d ))
        cmm_lookup(
            "foo", 
            &typeid( void ), 
            &typeid( d), 
            &typeid( d), 
            0))
    ( d, d);

This uses the cmm_lookup() function to find the best matching function called foo that returns void, and takes two parameter both with type reference to typeid( d). cmm_lookup() returns a generic function pointer void (*)(), so it is casted to the real type, which takes base parameters.

Every function definition is augmented with a wrapper function that takes base parameters, and a global integer that registers this wrapper function with cmm_lookup(). It is the wrapper function that wil be returned by cmm_lookup().

The extra code that is generated for a function int foo( derived& a, derived& b) {...} looks like:

int cmm_veneer_xyz( base& a, base& b)
{
    return foo( static_cast< derived&>( a), static_cast< derived&( b));
}
static int
    cmm_veneer_xyz_registration
        = cmm_register_fn(
            "foo",
            (cmm_fnptr) cmm_veneer_xyz,
            &typeid( void), 
            &typeid( derived&), 
            &typeid( derived&),
            0);

Caller-virtual functionality is added to the normal cmm -s behaviour using the -caller-dispatch flag. The -detailedparse flag is also required.

The implementation in dispatch.cpp uses a std::map cache, mapping from a structure encoding the function name, return type and parameter types, to a function pointer. It would be possible to get constant time dispatch speed by using small integers inside std::type_info and having a local static variable in the calling function that is initialised to select the function name.