Cmm - other C++ language extensions

1 Alternative declaration syntax

The -newdeclsyntax switch makes the parser accepts the `radical' alternative declaration syntax suggested by Stroustrup in section 2.8.1 (page 46) of The Design and Evolution of C++, and converts it into legal C++.

For example, you can write:

main: ( argc: int, argv: []->char) int

- instead of:

int main( int argc, char* argv[])

A more extreme example is the declaration of ANSI C's signal() function, which is usually written:

void ( *signal( int sig, void (*func)( int sig)))( int sig);

The alternative form is:

signal: ( sig: int, func: ->(sig: int) void) -> (sig: int) void;

The target test17 in makefile.mk tests that the latter is converted into the former.

Note that by default the parser only fully analyses global declarations, so it won't convert declarations that inside class or function bodies. To convert declarations within functions, use the -detailedparse switch.

The new declaration syntax conflicts with the syntax of goto labels.

2 Embedded functions

Cmm has a very experimental system that is intended to simplify the use of STL algorithms. This involves allowing whole function definitions inside expressions.

When such an embedded fuction definition is found, Cmm moves it to above the function that contains the expression, and replaces it with the name of the function. Anonymous functions are given unique names automatically.

This allows one to write code like:

typedef std::vector< int>        container;
container        foo;
...
std::find_if( foo.begin(), foo.end(),
        bool ()( container::const_iterator it)
        { return *it < 7;}
);

This is an alternative to using the STL's binder templates - the above example could be written in standard C++ as:

std::find_if( foo.begin(), foo.end(),
        std::bin2nd( std::less<int>(), 7));

In this case, the STL's way of doing things doesn't look too bad, but if you want to do something more complicated, there can be a real mess of bind2nd, ptr_fn etc.

For example:

for_each(M.begin(), M.end(),
        compose1(mem_fun(&Object::Draw),
                select2nd<map>int, Object *>::value_type>()));
- can be replaced with:
for_each( M.begin(), M.end(),
        void ()( Map::iterator it) { it->second->Draw();});

This extension requires the -detailedparse and -embeddedfns command-line switches. See the target test28 in makefile.mk, which builds the code in examples/embeddedfns/simple.cmm.

It should be straightforward to convert embedded functions into functors, so that the compiler can perform better optimisation, but this hasn't been done yet.

The embedded function will not be able to see any variables or types or typedefs defined inside the function that contains it.

Embedded functions are converted into conventional out-of-line functions. Their advantage is locality - you can put the code that is called by an algorithm, inside the call to the algorithm, which saves you from having to scroll up to see the code.

3 Block structure from indentation

In virtually all C/C++ code, the indentation precisely mirrors the block structure defined by { and } characters. The -autoblocks makes Cmm insert { and } characters where the indentation changes, allowing an arguably simpler and cleaner syntax.

See the examples in examples/autoblocks - examples/autoblocks/main.cmm is a Hello World programme, while examples/autoblocks/main2.cmm also uses the alternative declaration syntax described above. The target test35 and test36 in makefile.mk test that the converted versions of these files compile and run correctly.

The current implementation of -autoblocks knows not to insert { or } within (...), or when the first non-white space character on a line is already { or }, but can still get things wrong, for example:

int x = 4
        + 7;
foo();

This will be converted into:

int x = 4
        {+ 7;
}foo();

Which is less than useful. In future, I might look at changing things so that { and } are only inserted when the grammar allows them.

Tabs are assumed to be every four characters.

Cmm has a hack to allow autoblocks to be turned on and off inside source files. If, at top-level scope, the variable cmm_pragma_autoblocks_on is declared (as any type, e.g. int cmm_pragma_autoblocks_on;), the declaration is ignored and autoblocks is turned on. Similarly, a declaration of cmm_pragma_autoblocks_off turns autoblocks off.

4 Reflection support

The special keywords @cmm_memberrecursivefn and @cmm_memberreflectfn can be used to automatically generate complete function definitions. They are used as prefixes to a function prototype, the first parameter of which is required to be a class or reference to a class. The generated function body consists of a series of function, passing each of the class's member variables in turn. In addition, the class's base classes (if any) are also treated in the same way.

The intention is to automate the generation of serialisation/unserialisation code for classes. The user provides the overloads for basic types (int, std::string etc), which are then used as required for user defined types.

4.1 @cmm_memberrecursivefn

@cmm_memberrecursivefn is the simpler of the two forms, and can be used to serialise a class to a std::ostream int the following way:

struct  Base    { int   a; std::string text  }
struct  Derived { double b; }
    
@cmm_memberrecursivefn void ToStream( Base& base, std::ostream& s);
@cmm_memberrecursivefn void ToStream( Derived& derived, std::ostream& s);

Cmm expands the last two lines into the following function defintions::

void ToStream( Base& base, std::ostream& s)
{
    ToStream( base.a, s);
}
void ToStream( Derived& derived, std::ostream& s)
{
    ToStream( static_cast< Base&>( derived), s);
    ToStream( derived.b, s);
    ToStream( derived.text, s);
}

Similarly, one can generate functions to read from a std::istream:

void    FromStream( int x, std::istream& s)    { s >> x;    }
void    FromStream( double x, std::istream& s) { s >> x;    }
    
@cmm_memberrecursivefn void FromStream( Base& base, std::istream& s);
@cmm_memberrecursivefn void FromStream( Derived& derived, std::istream& s);

One doesn't have to use std::istream and std::ostream. It's straightforward to use RPC's XDR functions to allow generic streaming of C++ objects across sockets.

The generated code assumes that a suitable function exists for each member or base class. So if a class contains a member std::string foo, the generated code asumes that a call to a similarly named function passing a std::string is legal.

4.2 @cmm_memberreflectfn

@cmm_memberreflectfn is a slightly more powerful feature. It behaves in the same was as @cmm_memberrecursivefn, but the generated function definition calls the same named function but with cmm_reflect appended to the function name. Also, the first parameter is replaced by three parameters: a reference to the member (same as for @cmm_memberrecursivefn), a const std::type_info& and a const char*.

The intention is for the user to write a function template that takes the full information generated by @cmm_memberreflectfn, and makes recursive calls for each member variable. For example:

template< class T>
void    Show_cmm_reflect(  T& data, const std::type_info& static_type, const char* name, std::ostream& out)
{
    out << name << "=" << static_type.name() << "={";
    Show( data, out);
    out << "} ";
}
@cmm_memberreflectfn
void Show( const MyClass& c, std::ostream& out);

- and then make calls for a particular type like:

MyClass&    myclas = ...;
Show( myclass, std::cout);

This enables the user to get full information about each parameter before passing on to the real output function.

Currently, @cmm_memberreflectfn handles declarations where the first parmeter is struct <class name>& as well as <classname>&. @cmm_memberrecursivefn doesn't handle this case. Thsi can be useful for generating (for example) diagnostics for struct stat on unix systems.