cleeus.de

My Favourite C++ Idiom: Static PIMPL / Fast PIMPL

Kai Dietrich / 2017-03-10T20:00 / filed under: C++ optimization allocation memory

When working in a large C++ codebase, compile times become a major wory. One reason for slow compile times will be transitive header dependencies. Every header file should include what it needs. The preprocessor will then collect all the files and build a huge source file (the compilation unit) that will be fed to the actual compiler. This creates lots of disk I/O and a ton of C++ code which must be parsed by the compiler frontend, for which code will be generated and finally most of that will be eliminated by the linker again.

One way to reduce the includes of a header is the PIMPL idiom. Instead of putting the implementation details of classes in the header, you move it into a source file (compilation unit).

Let's look at a simple example class that wraps stdio.h file handling:

/** \file File.h */

#include <stdio.h>
#include <string>

class File {
public:
  File();
  ~File();

  bool open(std::string path, std::string openmode);

private:
  std::string m_path;
  std::string m_openmode;
  FILE *m_handle;
};

The open(...) and close() methods would be implemented in the .cpp file something like this:

/** \file File.cpp */

bool File::open(std::string path, std::string openmode) {
  if(m_handle != NULL) {
    //close();
  }

  FILE* f = fopen(path.c_str(), openmode.c_str());

  if(f!=NULL) {
    m_handle = f;
    m_path = path;
    m_openmode = openmode;
  }
}

This imports the stdio.h header and everything that imports your beautiful new File class header will now get its global namespace polluted with C symbols. In this specific case a simple forward declaration for FILE struct could help but forward declarations are not always possible or a good solution.

A common pattern to completly hide the implementation of a class is to use the PIMPL ("private implementation" or "pointer to implementation") idiom. When using the PIMPL idiom, you put a shallow wrapper class in the header file. It has all the methods of the real class, but only one member: A pointer to a forward declared class. There are variations around the exact implementation technique. I like the following C++11 std::unique_ptr based version the most. For brevity, I will omit the required copy- and move-construtors and operator= signatures. Since a PIMPL class essentially becomes a container for a single object, you have to add these - see also The Rule of Three/Five. As a side-effect, this also makes it a trivially moveable object, giving you fast and exception-safe move semantics.

/** \file File.h */

#include <string>
#include <memory>

class File {
public:
  File();
  ~File();
  //copy- and move- ctors/operator= methods here

  bool open(std::string path, std::string openmode);

private:
  class FilePIMPL;
  std::unique_ptr<FilePIMPL> m_impl;
};

The complete implementation will be moved into a compilation unit.

/** \file File.cpp */

#include <File.h>
#include <stdio.h>

class File::FilePIMPL {
public:
  FilePIMPL() :
    m_handle(NULL)
  {}

  bool open(std::string path, std::string openmode) {
    //... implementation here
  }

private:
  std::string m_path;
  std::string m_openmode;
  FILE *m_handle;
};

File::File() :
  m_impl(new FilePIMPL())
{}

File::~File() {
}

bool File::open(std::string path, std::string openmode) {
  return m_impl->open(path, openmode);
}

This solved the problem of the hiding the complete implementation but introduced a new one: An additional allocation (new FilePIMPL()) of the PIMPL class on the heap in the default constructor File::File(). Sometimes this is not a big deal, especially when dealing with plumbing code or business logic. But allocations are not free. They take precious clock cycles, may grab a lock on the heap and thus limit parallelization or can fail and throw exceptions. When writing performance sensitive code, an additional allocation may be prohibitively expensive. But there is a solution in one of the more dangerous but magically enabling features of C++: placement new. Instead of constructing the PIMPL class on the heap with the regular new operator, we use the placement new operator and create the object in a embedded buffer inside the File class. There are two important details here: The buffer where our implementation class will be created must be

big enough to hold the object, and
properly aligned.

Since the compiler cannot see the implementation, it cannot know the size nor the the proper alignment and we must manually choose both.

/** \file File.h */

#include <string>
#include <cstddef>
#include <type_traits>

class File {
public:
  File();
  ~File();

  bool open(std::string path, std::string openmode);

private:
  static constexpr std::size_t c_impl_size = 128;
  static constexpr std::size_t c_impl_alignment = std::alignment_of<std::max_align_t>::value;
  typedef std::aligned_storage<c_impl_size, c_impl_alignment>::type aligned_storage_type;
  aligned_storage_type m_impl;
};

To make working with placement new a little more pleasant, I will add a few templated wrapper functions:

/** \file placement_new.h */

#include <cstddef>

///create an object of type T at a given address
template<typename T>
void placement_new(void *buffer, std::size_t buffer_size) {
  new(buffer) T();
}

///cast a given address to a pointer to type T
template<typename T>
T* placement_cast(void *buffer) {
  return reinterpret_cast<T*>(buffer);
}

///call the destructor of type T at a given address
template<typename T>
void placement_delete(void *buffer) {
  placement_cast<T>(buffer)->~T();
}

We will new use the placement_* functions to construct objects inside

/** \file File.cpp */

#include <File.h>
#include <stdio.h>
#include <placement_new.h>

class FilePIMPL {
  //full implementation in this class
};


File::File() {
  //construct a FilePIMPL instance in m_data
  placement_new<FilePIMPL>(&m_impl, sizeof(m_impl));
}

File::~File() {
  //destruct the FilePIMPL
  placement_delete<FilePIMPL>(&m_impl);      
}

bool File::open(std::string path, std::string openmode) {
  return placement_cast<FilePIMPL>(&m_impl)->open(path, openmode);
}

This is the basic mechanics of what I call the "Static PIMPL" idiom or pattern. I have not seen it used or called like this anywhere else, so I declare myself the inventor. But I'm probably wrong and it's already been used in a lot of places, maybe even inside STL implementations. If it has another name, please drop me a line.

(Update: Turns out Herb Sutter actually posted something like this as GotW#028 and calls it "Fast PIMPL" and if you google for Fast PIMPL, you will find multiple other authors describing more or less exactly what I'm lining out here)

As said above, the compiler cannot know the size and alignment requirements, so we must choose them manually. For the alignment, you can choose the platforms maximum alignment: std::alignment_of<std::max_align_t>::value. This is safe, but may cost a few bytes through over-alignment. For the size, you can measure the size of the implementation classes on the platforms you support and write some preprocessor logic to define a constant per platform. What also tends to work reaonably well is to measure the size on one platform, then devide it by sizeof(void*) and express the size constant as a multiple of sizeof(void*). When size and/or alignment are wrong, you are doomed and your code will fail in mysterious ways. Effects range from just slight performance penalties up to broken atomics (and thus invisible race conditions). Messing with alignment and getting it wrong is also a common source for CPU exceptions when the compiler generates vectorized code. And if it doesn't happen today, it can happen any time in the future with a new compiler. So it is essential to guarantee size and alignment.

To guarantee that, you have to add a few asserts and static_asserts.

/** \file placement_new.h */

template<typename T>
void placement_new(void *buffer, std::size_t buffer_size) {
  //check the size of the buffer (at runtime, in debug builds)
  assert(sizeof(T) <= buffer_size);
  //check the alignment of the buffer (at runtime, in debug builds)
  assert(std::align(std::alignment_of<T>::value, sizeof(T), buffer, buffer_size) == buffer );
  new(buffer) T();
}

/** \file File.cpp */
File::File() {

  static_assert(sizeof(m_impl) >= sizeof(FilePIMPL),
    "buffer not big enough to hold FilePIMPL");

  static_assert(
    std::alignment_of<aligned_storage_type>::value
    >=
    std::alignment_of<FilePIMPL>::value),
    "alignment requirements of FilePIMPL not fulfilled");

  //construct a FilePIMPL instance in m_data
  placement_new<FilePIMPL>(m_impl, sizeof(m_impl));
}

This is not a technique to use lightly. Only use it when you really have performance requirements that trump maintainability concerns. If your implementation is big or changes often, a classic PIMPL might be more appropriate as adjusting the buffer sizes will become a tedious activity. I used it successfully to implement Mutex classes that are used throughout the codebase and are implemented using each platforms native facilities (Win32 vs. pthread).