Using GNU G++ for CS 225 MPs

Introduction

What this is all about

The official compiler for CS 225 Machine Problems is Sun's C++ compiler. This compiler is installed on the Sun workstations in the CSIL labs, but not many other places. This makes it hard to work on your MPs from anywhere else. Getting yourself a copy of Sun's compiler will cost you close to $2000 (and you have to be running Solaris). But there is another solution. You can use a different compiler, G++, to develop your MP solution when you don't have access to the CSIL lab. Even when you are in the Lab, you might want to give G++ a try anyway, especially if you are struggling to find bugs.

G++ is a the C++ part of GCC (the GNU Compiler Collection). It is already installed on the CSIL machines. It comes standard with every Linux and BSD distribution, if you happen to run a Unix-like operating system already. G++ is the primary C++ compiler for OS X (as a part of the OS X Developer Tools). It's even usable on Windows! G++ (and all of GCC) is Free Software. That means that you can freely modify and redistribute it if you want. Oh, it doesn't cost anything either.

It does require some effort to use G++ on your MPs. This is partly because of the design of the MPs and partly due to the design of G++. The MPs expect certain behavior from the C++ compiler, and G++ works differently from the Sun C++ compiler in some ways. This document exists to point out some of the issues and to show step by step how to make it work.

Disclaimer

Please, go back and read the first sentence of this document again. The official C++ compiler for CS 225 is the Sun compiler on the CSIL machines. Jason has stated in past semesters that your grade will depend on your code compiling on the CSIL Sun machines, with the Sun compiler, not with G++. Usually correct, well written code will compile and run identically on both, but you can never be totally sure. Always test your code with the Sun compiler before you hand it in!

Contributors

This web page was created by Steven Barker, a former CS 225 student and current Chair of ACM's Special Interest Group for Unix (SIGUnix).

Thanks to Stephen Saville for loaning me his MP2 solution files while I was writing the Makefiles and source code, and for his suggestions on how to improve this doc.

This document is a work in progress. Contributions (and corrections) are welcome!

Switching to G++

Simple Programs

Using G++ to compile simple C++ programs like MP1, is easy. All that needs to be done is to tell make to use G++ instead of Sun's compiler. To do that you need to edit the Makefile that comes with the MP.

Look for this section of the given Makefile:

#**************************************************************************
# Macros defining the C/C++ compiler and linker.

CC = CC
CCOPTS = -g
LINK = CC
LINKOPTS = -o $(EXENAME)
      
You want to change the lines for CC and LINK to both call g++ instead of CC. Afterwards, it should look like this:
#**************************************************************************
# Macros defining the C/C++ compiler and linker.

CC = g++
CCOPTS = -g
LINK = g++
LINKOPTS = -o $(EXENAME)
      
For MP1, thats all you need to do! Run make clean (to remove any old files compiled by CC) then just make. You should see lines like:
g++ -c -g vehicle.cpp
g++ -c -g main.cpp
g++ -o mp1 vehicle.o main.o
      
That shows G++ compiling your MP1 code successfully.

MP1 is a simple program, and does not make use of the more complex features of C++, so the two compilers both work on it with no other changes.

More Complex Programs

The Trouble with Templates

If you try the above modification of the Makefile for a program that uses Templates, like MP2, you will get a bunch of errors from the linker. The errors will look something like:

g++ -o mp2 asserts.o array.o vehicle.o string.o utility225.o  numbered.o maptests.o main.o
Undefined                       first referenced
 symbol                             in file
NumberedItems<String>::~NumberedItems(void)main.o
NumberedItems<Vehicle>::Print(void) main.o
void AddToMap<Vehicle>(NumberedItems<Vehicle> &, int, Vehicle)main.o
...
NumberedItems<Vehicle>::~NumberedItems(void)main.o
ld: fatal: Symbol referencing errors. No output written to mp2
collect2: ld returned 1 exit status
*** Error code 1
make: Fatal error: Command failed for target `mp2'
      
Our challenge is to fix these errors.

The GCC manual has this to say on the subject of templates:

C++ templates are the first language feature to require more intelligence from the environment than one usually finds on a UNIX system. Somehow the compiler and linker have to make sure that each template instance occurs exactly once in the executable if it is needed, and not at all otherwise.

The handling of template instantiation is the biggest place where G++ has different behavior than Sun's compiler. As the MPs for CS 225 make extensive use of templates, this is the primary obstacle to using G++ to compile MPs. This section will explain the problem in some detail, then present two solutions to the problems faced. If you don't care about why the issue exists, you can skip the explanation and get right to the answers.

How Instantiation Happens

The problem is that G++ does not handle template instantiation for you. "What does 'template instantiation' mean?" I hear you say. Well, a function or class template is not a real function or class until it is instantiated. That is, the code

template<class T>
  T foo(const char *)
{
  ...
}
      
does not really define a function. It defines a function template. That's a small difference, but it's important. The compiler cannot write out machine code to implement that template, because it does not know what kind of object T is. Only when some other code does something like:
Bar foobar = Foo<Bar>("foobar");
      
is a real function, named Foo<Bar>, defined. This is what I mean by "template instantiation". The function Foo<Bar> is an "instance" of the template Foo<T>.

So how should the machine code for the function Foo<Bar> be made? The template may be declared in a file foo.h and defined in foo.C. But if the instantiation takes place in the file foobar.C, how can the compiler make the right code? It needs to read both the template implementation from foo.C, and the instantiation details (such as the declaration of the type Bar) from foobar.C.

And the truth is that there's not a clear answer. Sun's C++ compiler does a lot of complicated voodoo to instantiate the right templates most of the time (that is what the SunWS_cache directory is used for). G++ does not do that, as it is hard to get right for all cases. Rather, you must do something to ensure any instantiations you need get made. If you don't, the linker will complain about missing symbols.

A Quick and Dirty Solution

The first solution for the template instantiation problem is to put the full definition of the template in the header file. If foo.h holds the full implementation of the Foo template then when foobar.C does #include "foo.h"it gets all the details it needs to instantiate the template. However, this does not work for code that is already separated into separate .h and .C files, like all of the given files for the MPs.

A workaround for that is to hack an #include of the .C file onto the end of the .h file. So the end of the file foo.h would look like:

   ...
   // end of the code

#include "foo.C"

#endif // not defined FOO_H
      
This will work to generate all the needed instantiations of the template, but will cause compiler errors when you attempt to compile foo.C. The reason is that the file #includes itself, something the compiler does not appreciate. The easy fix is simply to skip compiling that file. Everything defined in it will already be compiled when other files #include it's header, so nothing is lost.

To do that for one of the MPs, you have to modify the Makefile. Here is a Makefile for MP2 that will work if you put an #include in the given array.h, maptests.h and utility225.h and in your own numbered.h.

That solution is rather inelegant (OK, I'll be honest, its a gross hack). It requires you to modify the given code of the MP. It also may slow your compile to a crawl as all of the .C files is included over and over. There is a better way.

A Better Solution

The second solution is to explicitly state what instantiations we need for each template. The syntax to do that for function templates is:

template Bar Foo<Bar>(const char *);
      
Class templates are instantiated similarly. For example, one use of the the pair class from MP2, could be instantiated with:
template class pair<int, String>;
      

These explicit instantiations must be made where the template's implementation is visible, so foo.C and util225.C would seem like the best locations for the above examples. But we find another problem: the Template parameter types (Bar and String) are not be defined in those files.

So we make a new file for each template, foo_inst.C that holds the explicit template instantiations for the Foo<T> template. It's contents are very simple:

#include "foo.C"

#include "bar.h"

template Bar Foo<Bar>(const char *);
      
We can now compile foo_inst.C instead of foo.C, and the compiler will generate all the instantiations we told it to make.

To apply this to MP2, you must make the files array_inst.C, utility225_inst.C, maptests_inst and numbered_inst.C. They should #include any definitions necessary to instantiate the template. Finally, you need to modify the Makefile to build the new files. I put together a sample tar ball for MP2.

Resources

For more information

For more detailed information about how G++ handles templates, consult the GCC manual. It has some informative discussion of the G++ template instantiation behavior, including command line options that were ignored here for simplicity. You can also read the manual with the info program on most Unix systems. Run info gcc.

Debugging

G++ (and GCC in general) can build executable files with debugging information for the debugger GDB. GDB can show you exactly where a nasty segfault is hiding in your code. It also lets you step through a program one instruction at a time, letting you quickly track down subtle mistakes. More information on GDB is available from it's manual (also readable by running info gdb).

There's a nice graphical front end to GDB, called DDD (the Data Display Debugger). It is installed on the CSIL lab machines, and can help you analyze data structures while debugging. See it's manual for more information. It is also readable with info (run info ddd).

If it Doesn't work

Please, don't bother Jason or the CS 225 TAs if these instructions do not work for you. They work extremely hard to ensure that the MPs work on one platform, and they shouldn't need to worry about everything else out there. Rather, contact SIGUnix, either through our newsgroup, or on our email list. We will be happy to help.