#include <rules>

We’re stuck with C++, at least for another console generation. C++ has many quirks that I wish were not there, but there is no real alternative as of today. While modern languages tend to adopt the bulk compilation and/or smart linkers and so can have a proper module system and eat the cake too, C++ is stuck with header files (on the other hand, C++ builds are incremental and almost embarrassingly parallel). While the strategy of dealing with header files and staying sane seems more or less obvious, I’m amazed as to how many people still get this wrong. I hope that this post helps to clear the mud somewhat. The post applies to C as well, but is useless for people who are blessed to work with other languages.

The problem with include files is that the preprocessor is usually quite dumb – you tell it to include the file, it includes the entire contents of the file, recursively. If you don’t tell it to include the file but try to use the symbol from that file – you get a compilation error. If you tell it to include too many files, it includes all of them, and the compilation time suffers.

In general, the more a header is included in other files (including transitive inclusion, i.e. A includes B includes C means that A indirectly includes C), the more files you’ll need to recompile once the header changes. Iteration time is very important – which is a topic for another time – so we’d like to minimize the amount of header inclusion. This brings us to the first important rule: Each file should include the minimum amount of files. The rule helps ensure that your code builds fast.

Now, let’s suppose that the header file contains a class declaration. By the nature of C++, a class declaration won’t compile without some other declarations – for example if a class A inherits from a class B and contains a field of type C, then you have to give the compiler declarations of both B and C in the same translation unit (i.e. in the cpp file that you’re compiling – after preprocessor has done its work) – before A’s declaration. Now, there are two options here – you can either include the relevant header files in the header with A’s declaration, or force the user to always include B and C headers manually before A. The problem is that sometimes the user does not know about these dependencies (i.e. the field of type B can be private), sometimes the dependencies change, so every time you’re adding some declaration dependencies to your types you’re breaking user’s code, and, since declaration dependencies are transitive, often to include a single header you’ll need a dozen or more seemingly unrelated ones. For this reasons, it’s important for all headers to be self-contained – anybody should be able to include any header in any cpp file without compilation errors. Which brings us to the second important rule – each file should include all dependent headers, i.e. for each declaration that’s required by the compiler there should be a corresponding include. This rule helps ensure that the programmers stay sane.

These two rules together define the algorithm for proper header file authoring: for each required declaration, include a corresponding header in your header file; don’t include more headers than that. In order to guarantee that you did not forget the necessary headers, make sure that your header file is the first #include in the corresponding source file, except the common header, if your codebase has one.

Do not include a header for a dependency declaration where a forward declaration will suffice; use forward declarations when possible (if you’re not familiar with forward declarations, google it). Sometimes it pays off to go to extra lengths to remove header dependencies, using techniques like pimpl – this depends on the exact situation, but avoid including heavy platform files, like windows.h or d3d9.h, to popular headers (I’ve written about a way to make a slim version of d3d9.h in a blog post, scroll down to the last section).

With the rules above, there is only one thing left – since we can include a header twice accidentally (i.e. A depends on B and C, and B depends on C, so C is included twice into A), we’ll need some protection against that. So each file should include the guards against multiple inclusion. There are two methods for this – either use #pragma once or use header guards. #pragma once is a non-standard technique, that tells the preprocessor explicitly “don’t include this file more than once in a single translation unit”. Header guards can emulate the behavior using preprocessor defines:

#ifndef FILE_NAME_H
#define FILE_NAME_H
...
#endif

Many people don’t know this, but #pragma once is widely supported in modern compilers. It’s superior to header guards in two ways: it can be faster than header guards (i.e. MSVC does not read the file with #pragma once more than once, but does read the file with header guards several times), and it’s foolproof – you don’t have to invent the identifier for a header so you can’t screw it. So use #pragma once if you can, use header guards if you must. If some compilers that you use don’t support #pragma once and you can’t convince the vendors to add the feature, make sure that the header guards are unique using a deterministic generation algorithm. For example, you can use something like “take the list consisting of the name of the project, and all components of the relative file path; convert all elements to upper case and join with underscore”, resulting with identifiers like THEGAME_RENDER_LIGHTING_POINTLIGHT_H. Do not use short file names alone, they are not unique! (unless your coding standard requires that). Oh, and if you don’t use an autogenerating macro, don’t put a comment after the #endif (i.e. #endif // THEGAME_RENDER_LIGHTING_POINTLIGHT_H) – such comments are only useful as a copy-paste history.

While using header guards allows you to have the same file included several times in a single translation unit, it also allows you to test whether the file was already included, i.e. #ifdef THEGAME_RENDER_LIGHTING_POINTLIGHT_H. You should never conditionally exclude a section of a header file based on whether some file was included! Doing this introduces the inclusion order dependency which is unnatural, and hard to debug without a preprocessor output. If you’re thinking about something like “oh, if the renderer interface was included, I should probably provide a light renderer class, but otherwise it would just add unnecessary clutter”, you should split your header file in two parts, and the second part should explicitly include the renderer interface, since it depends on it.

At least in game development, the language is frequently extended with some generally useful primitives that are used throughout the whole codebase. The most used one is probably an assertion macro (since the standard one sucks, you should have your own), but there are other examples – logging facilities, fixed-size types, min/max functions, various platform/configuration defines (“are we on a big-endian platform?”), memory management-related macros. It’s common practice to put all of those in a single common header file; you should control the size of this file (where by ‘size’ I mean the cumulative size of all headers it includes, of course), and you should make sure that each source file includes the common header before everything else – otherwise you’ll get into trouble (sometimes you’ll spend several hours looking for the reasons – i.e. if you include a header that checks platforms endianness before the common file, you’re in the world of hurt).

Well, I think that’s all about header files; there are also the include paths though. In order to include the file, you have to specify the path to it – either a “relative to the current file” path, or “relative to one of the include directories” path. There are two important goals here:

  • If you’re writing a library – a relatively small one, i.e. not a platform like Unreal Engine – the header files should require minimal configuration, so ideally the user does not have to add include directories to compile or use your library. For such projects, consider making all include paths current file-relative.
  • Otherwise, include paths should be easily greppable – the path to the same file should ideally be the same in all other files. So make all include paths include directory-relative; moreover, try to make sure that include paths are unambiguous – i.e. that you don’t have two different representations for the same file path, like and inside render project.
  • Whatever rule you use, try to make sure it’s consistent between different projects, as much as necessary. Ideally even the include directories should be the same, i.e. include directories for the engine project should be a strict subset of include directories for the game project.

And as a final advice – learn to use the preprocessor output (cl /E, gcc -E), learn to use the include output (cl /showIncludes, gcc -M), gather the codebase statistics (average size after preprocessing, most included header files, header files with largest payload, etc.) and optimize your codebase by eliminating dependencies and spreading the word. Nothing beats a sub-second iteration time.

Oh, did I mention that good header dependencies decrease the linking time?

Advertisements
This entry was posted in C++, Compilation speed. Bookmark the permalink.

23 Responses to #include <rules>

  1. tuan_kuranes says:

    A nice tool that helps as well, is cppclean. It’s a static analysis tool, specific about header use, called cppclean : http://code.google.com/p/cppclean/

  2. triton says:

    Also I would add a note about precompiled headers. They are essential in any complex project if you want to remain sane! They decreased compilation times a lot when I started using them.

  3. Michal Mocny says:

    Great article! I agree with all of your proposals, and have argued them myself. (I get a lot of flack for using #pragma once, but strongly believe it to be worthwhile).

    I would like to mention, however, that I dislike forward declarations at the “client” end. Whenever I need a forward declaration, I (nearly always) make a corresponding *_fwd.hpp file with the *.hpp forward declarations there.

    Also note that sometimes forward declaration is used for more than compile efficiency: like to prevent circular dependency.

    Also note that sometimes forward declarations is useless: like when forward declaring a Derived class and which you cannot now treat polymorphicaly (since you cannot forward declare inheritance). This always bites me when I use covariant return types in Tree structures with references to parent/children and different class structures at each level.

    The one thing I dont understand is why you favor relative include paths for “small” libraries? I don’t see the benefit? Perhaps it is “easier”, but I don’t find that myself because I put “external” library headers in a separate src directory hierarchy (I find this easier to organize), which I then add as an Include path to my own Make/SConstruct, and so I treat my library headers like any other client would.

    Anyway, thanks for the article, Cheers.

  4. zeuxcg says:

    tuan_kuranes, the project seems interesting but unfortunately also seems dead, and the include cleanup is not implemented; still, even if it were alive – the main goal is to raise awareness, and to inspire action, then one can either clean up his mess manually, or use automated tools (manual cleanup is, of course, not practical for huge legacy codebases).
    Anyway, thanks for the link!

  5. zeuxcg says:

    triton, if your compiler properly supports precompiled headers – sure, you can use them (the only such compiler I know of is MSVC, GCC support for precompiled headers is somewhat weird). However, there’s an interesting, but obvious, correlation – the better your header is structured, the less you gain from PCH. For example, for a typical game, you don’t want to include windows.h in your PCH – despite it being precompiled, you still lose compilation time that way, and if your codebase is properly structured, you don’t need it more than in a few percent of files.

    MSVC PCH files have an additional useful property – they help ensure the ‘include common file before everything else’ rule, so one might use them if only to gain that.

  6. zeuxcg says:

    Michal Mocny: Yeah, sometimes you can’t compile your code without forward declarations, though that should be obvious (since the compiler slaps you!).

    As for inheritance, well, I rarely encounter this problem – you only do if you use the covariant return type feature, which is probably rare? At least I don’t use it that often :)

    As for the benefit for include files – it’s easy to see on other third-party libraries. For example, Lua and Zlib use relative include paths, so you can put them anywhere you want, and 1) the library code will compile without include directory tweaking, 2) you can use any include directory rule you like – we used to have lua in file at my previous job, and lua/ folder contained additional useful helpers.

    On the other hand, some arrogant libraries include their central file via libraryheader.h, thus forcing me to add their folder to include paths just to compile it. The ones that use are usually ok, yes.

    The reason I make the small-large distinction here is that if you have a lot of sources, you probably need a hierarchy, and hierarchical file-relative includes are messy (../../core/ilogger.h is just wrong). But if you have a flat hierarchy, there’s no reason not to use file-relative includes.

  7. Sean Barrett says:

    ‘#pragma once’ may not be necessary, but I don’t know for sure where compilers ended up.

    Back in the 90s(?) this was going around on comp.compilers, and it was pointed out that it’s actually pretty straightforward for a compiler to detect if the first non-comment of a header file was ‘#ifndef’, and the last non-comment was the matching ‘#endif’. If a compiler detects this, it can automatically apply pragma-once-like processing to the header file from then on in the compilation.

    In other words, these people pointed out, there was no need to introduce #pragma once, since the compilers could just get the win from the existing standard convention.

    I thought gcc implemented it, but I’m not positive (this was a long time ago), and I have no idea about current compilers. Probably wouldn’t be hard to test, though.

    The reason any of this matters is because [i]pragma[/i] is problematic if you care about maximum portability; the whole #pragma is a fucked-up design because every #pragma needs to be wrapped in an explicit list of #ifdefs for the compilers which support that pragma; you can reduce that list to a single flag of your own that you define somewhere, if it’s something you use repeatedly, but even so you still end up always needing the three-liner #ifdef/#pragma whatever/#endif if you want to do it correctly.

    • zeuxcg says:

      Sean, as far as I know, gcc indeed implements this optimization. However, this is an example of trying to fix the already existing faulty design – clearly, the #pragma once is superior because it concisely states the intent.

      The only problem is portability, which is usually not a problem in practice – the only recent compiler that is useful I know of that does not implement #pragma once is Sun C++; for gamedev-related code, #pragma once should be fine. Oh, and using header guards instead of pragma once for portability reasons alone – without actual compiler testing! – is useless, because there are many ways to screw up portability with C++.

      • Sam Martin says:

        I think you should prefer the #ifndef approach.

        Firstly, at least in my experience, it is comparable to #pragma once in both gcc and vc (and most likely others) so there’s no benefit to the later

        Secondly, as you point out, it isn’t standard. Although I also haven’t seen a compiler that doesn’t support it, it was (and maybe still is) a problem for Incredibuild. I suspect it would also complicate converting to a unity build system, which is pretty standard for large projects these days.

        #ifndef is just guaranteed to work, so if it’s not broke, don’t fix it :)

        Cheers,
        Sam

      • zeuxcg says:

        Sam, the difference in performance should be negligible because of file cache (there is also the preprocessing overhead, but I’d expect it to be comparatively small). I also haven’t seen any problems with #pragma once and IncrediBuild.

        And #ifndef is obviously only guaranteed to work if all guards are unique, which is the problem – you can’t break #pragma once if the compiler supports it :)

      • Sean Barrett says:

        #pragma once may not be a portability problem in practice, but neither are include guards a problem in practice; i can count on zero hands the number of times I’ve seen a conflict in them, and even if you ever had a conflict, it’s not exactly hard to fix.

        Oh, and using header guards instead of pragma once for portability reasons alone – without actual compiler testing! – is useless, because there are many ways to screw up portability with C++.

        I’m not really sure what you’re intending to communicate here. (a) if you’re trying to write portable code, you’re trying to write portable code, so obviously you make everything portable; the fact that you will have bugs in your portability doesn’t mean you feel free to use non-portable features–i just can’t make any sense of that; (b) if you’re not trying to write portable code, then yes, whatever. But then you’re not trying to write portable code.

        In fact, that’s why I said “if you care about maximum portability”; I think it should be clear from that phrase that I don’t think everyone (or even most people) would.

        As to how hard it is to write portable code, I have no idea how hard it is for you. I cut my teeth coding C on unix in the late 80s, so lots of habits became ingrained for me in terms of portability (you had to on unix). I’ve never lost those habits, and I’m still a C coder, so most code I write is just automatically portable with no special effort on my part (I still don’t use C99 features in my code in part to avoid losing portability). I’ve never had any issues with C++ portability either, but I don’t use C++ much and I avoid most C++ features so that may be why. (Obviously to interact with a platform, or drop to assembly, you lose portability. That’s neither here nor there.)

      • zeuxcg says:

        Sean, well, I’ve seen the conflict several times – while it’s easy to fix, it’s harder to find – and also, some people tend to abuse include guards (ifdefs in other headers), which leads to further trouble. Not having them is only slightly safer, but it is safer.

        As for portability – using the constructs that you know are not portable (i.e. BSTR for strings :-/) is different from using the constructs that are almost portable (#pragma once). The point is, with some features you never know if you’re portable without testing, so it does not make sense to invest in portability for unknown platforms. Of course, the list of known & tested platforms should be large enough, but that does not save you from having ‘portability bugs’ on untested platforms – and a single bug is enough to break the compilation.

        I consider myself to be rather good at writing portable code. I’m writing C++ however, with namespaces, classes, overloading and occasional use of templates – and while I only use templates in their simplest forms, I’ve had a lot of trouble even with them (i.e. using a templated function with template arguments that don’t change the function signature won’t work on MSVC6/7).

        Hell, I’ve recently had trouble with (anonymous) namespaces – in order to call a function from anonymous namespace from a function from the same namespace, one compiler wanted me to remove explicit :: qualification, and another wanted me to add one.

        And, finally, often in order for the code to be useful you have to rely on non-portable things. Like fixed-size types. Or thread-related stuff (I pity the person who writes thread-oriented portable code – I needed an atomic increment/decrement one day but did not find enough strength to find it everywhere, because it would be invariably broken / non-existent at some platform). Or floating point value classification. Or sockets. Or working with the file system (traversing directory tree). Or precise timing. The list goes on :(

  8. Rémy says:

    And what about using both ?

    #pragma once
    #ifndef FOO_H
    #define FOO_H

    class Foo
    {
    };

    #endif

  9. Gregory says:

    If #pragma once can’t improve build times such a huge code base as Chromium, then why bother?

    I’m with Sean, header guards I rarely saw name clashes.

    I’m also with Andrei Alexandrescu and Herb Sutter: “External include guards are tedious, are obsolete on today’s compilers, and are fragile with tight coupling because the callers and header must agree on the guard name.” — C++ Coding Standards: 101 Rules, Guidelines, and Best Practices

    • zeuxcg says:

      External include guards are (I hope, obviously) abysmal.

      As for Chromium, the analysis is incomplete. I’m pretty sure IncrediBuild implements its own header caching mechanism, at least I know I would – which means that the builder timing stats are meaningless. Still, I stand by my assertion – #pragma once is slightly better, and therefore it’s worth considering. Obviously, if you have a large codebase, you would not want to switch to #pragma once unless is brings considerable benefits – but if you’re starting fresh, why bother with header guards?

      It’s curious how #pragma once is considered as an alternative that is not worth it; I would consider header guards an alternative that is not worth it unless in rare circumstances. Still, everybody chooses for himself.

      • zeuxcg says:

        It’s funny how the least important thing (#pragma once vs header guards) generates the most feedback :)

      • Gregory says:

        It does generate feedback because it’s a one cool bikeshed we have at our disposal :)

        If I had to start fresh, provided #pragma once improves build times with MSVC 2010, I would probably do both.

        Have a nice day :)

  10. Pingback: #include <rules> / Wut I am

  11. Alexander Poluektov says:

    My 2 cents about precompiled headers.

    I think that using them contradicts the first rule: “Each file should include the minimum amount of files”.
    I agree with you that one should rather use forward declarations and minimal headers.

    But some developers complain that they don’t like spending their time trying to figure out which headers are necessary for compiling their sources. Rather they would include “God”-header file (for whole system or for component), which in this case would better be precompiled.

  12. Pingback: #include / Wut I am

  13. nightvisio says:

    I still don’t get it. Why the hell do you need to _guard_ all your headers at all? A header a project – a _#pragma once_ a project – KISS!

    – nightvisio

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s