255 lines
11 KiB
Markdown
255 lines
11 KiB
Markdown
<a id="top"></a>
|
|
# Authoring benchmarks
|
|
|
|
> [Introduced](https://github.com/catchorg/Catch2/issues/1616) in Catch 2.9.0.
|
|
|
|
_Note that benchmarking support is disabled by default and to enable it,
|
|
you need to define `CATCH_CONFIG_ENABLE_BENCHMARKING`. For more details,
|
|
see the [compile-time configuration documentation](configuration.md#top)._
|
|
|
|
Writing benchmarks is not easy. Catch simplifies certain aspects but you'll
|
|
always need to take care about various aspects. Understanding a few things about
|
|
the way Catch runs your code will be very helpful when writing your benchmarks.
|
|
|
|
First off, let's go over some terminology that will be used throughout this
|
|
guide.
|
|
|
|
- *User code*: user code is the code that the user provides to be measured.
|
|
- *Run*: one run is one execution of the user code.
|
|
- *Sample*: one sample is one data point obtained by measuring the time it takes
|
|
to perform a certain number of runs. One sample can consist of more than one
|
|
run if the clock available does not have enough resolution to accurately
|
|
measure a single run. All samples for a given benchmark execution are obtained
|
|
with the same number of runs.
|
|
|
|
## Execution procedure
|
|
|
|
Now I can explain how a benchmark is executed in Catch. There are three main
|
|
steps, though the first does not need to be repeated for every benchmark.
|
|
|
|
1. *Environmental probe*: before any benchmarks can be executed, the clock's
|
|
resolution is estimated. A few other environmental artifacts are also estimated
|
|
at this point, like the cost of calling the clock function, but they almost
|
|
never have any impact in the results.
|
|
|
|
2. *Estimation*: the user code is executed a few times to obtain an estimate of
|
|
the amount of runs that should be in each sample. This also has the potential
|
|
effect of bringing relevant code and data into the caches before the actual
|
|
measurement starts.
|
|
|
|
3. *Measurement*: all the samples are collected sequentially by performing the
|
|
number of runs estimated in the previous step for each sample.
|
|
|
|
This already gives us one important rule for writing benchmarks for Catch: the
|
|
benchmarks must be repeatable. The user code will be executed several times, and
|
|
the number of times it will be executed during the estimation step cannot be
|
|
known beforehand since it depends on the time it takes to execute the code.
|
|
User code that cannot be executed repeatedly will lead to bogus results or
|
|
crashes.
|
|
|
|
## Benchmark specification
|
|
|
|
Benchmarks can be specified anywhere inside a Catch test case.
|
|
There is a simple and a slightly more advanced version of the `BENCHMARK` macro.
|
|
|
|
Let's have a look how a naive Fibonacci implementation could be benchmarked:
|
|
```c++
|
|
std::uint64_t Fibonacci(std::uint64_t number) {
|
|
return number < 2 ? 1 : Fibonacci(number - 1) + Fibonacci(number - 2);
|
|
}
|
|
```
|
|
Now the most straight forward way to benchmark this function, is just adding a `BENCHMARK` macro to our test case:
|
|
```c++
|
|
TEST_CASE("Fibonacci") {
|
|
CHECK(Fibonacci(0) == 1);
|
|
// some more asserts..
|
|
CHECK(Fibonacci(5) == 8);
|
|
// some more asserts..
|
|
|
|
// now let's benchmark:
|
|
BENCHMARK("Fibonacci 20") {
|
|
return Fibonacci(20);
|
|
};
|
|
|
|
BENCHMARK("Fibonacci 25") {
|
|
return Fibonacci(25);
|
|
};
|
|
|
|
BENCHMARK("Fibonacci 30") {
|
|
return Fibonacci(30);
|
|
};
|
|
|
|
BENCHMARK("Fibonacci 35") {
|
|
return Fibonacci(35);
|
|
};
|
|
}
|
|
```
|
|
There's a few things to note:
|
|
- As `BENCHMARK` expands to a lambda expression it is necessary to add a semicolon after
|
|
the closing brace (as opposed to the first experimental version).
|
|
- The `return` is a handy way to avoid the compiler optimizing away the benchmark code.
|
|
|
|
Running this already runs the benchmarks and outputs something similar to:
|
|
```
|
|
-------------------------------------------------------------------------------
|
|
Fibonacci
|
|
-------------------------------------------------------------------------------
|
|
C:\path\to\Catch2\Benchmark.tests.cpp(10)
|
|
...............................................................................
|
|
benchmark name samples iterations estimated
|
|
mean low mean high mean
|
|
std dev low std dev high std dev
|
|
-------------------------------------------------------------------------------
|
|
Fibonacci 20 100 416439 83.2878 ms
|
|
2 ns 2 ns 2 ns
|
|
0 ns 0 ns 0 ns
|
|
|
|
Fibonacci 25 100 400776 80.1552 ms
|
|
3 ns 3 ns 3 ns
|
|
0 ns 0 ns 0 ns
|
|
|
|
Fibonacci 30 100 396873 79.3746 ms
|
|
17 ns 17 ns 17 ns
|
|
0 ns 0 ns 0 ns
|
|
|
|
Fibonacci 35 100 145169 87.1014 ms
|
|
468 ns 464 ns 473 ns
|
|
21 ns 15 ns 34 ns
|
|
```
|
|
|
|
### Advanced benchmarking
|
|
The simplest use case shown above, takes no arguments and just runs the user code that needs to be measured.
|
|
However, if using the `BENCHMARK_ADVANCED` macro and adding a `Catch::Benchmark::Chronometer` argument after
|
|
the macro, some advanced features are available. The contents of the simple benchmarks are invoked once per run,
|
|
while the blocks of the advanced benchmarks are invoked exactly twice:
|
|
once during the estimation phase, and another time during the execution phase.
|
|
|
|
```c++
|
|
BENCHMARK("simple"){ return long_computation(); };
|
|
|
|
BENCHMARK_ADVANCED("advanced")(Catch::Benchmark::Chronometer meter) {
|
|
set_up();
|
|
meter.measure([] { return long_computation(); });
|
|
};
|
|
```
|
|
|
|
These advanced benchmarks no longer consist entirely of user code to be measured.
|
|
In these cases, the code to be measured is provided via the
|
|
`Catch::Benchmark::Chronometer::measure` member function. This allows you to set up any
|
|
kind of state that might be required for the benchmark but is not to be included
|
|
in the measurements, like making a vector of random integers to feed to a
|
|
sorting algorithm.
|
|
|
|
A single call to `Catch::Benchmark::Chronometer::measure` performs the actual measurements
|
|
by invoking the callable object passed in as many times as necessary. Anything
|
|
that needs to be done outside the measurement can be done outside the call to
|
|
`measure`.
|
|
|
|
The callable object passed in to `measure` can optionally accept an `int`
|
|
parameter.
|
|
|
|
```c++
|
|
meter.measure([](int i) { return long_computation(i); });
|
|
```
|
|
|
|
If it accepts an `int` parameter, the sequence number of each run will be passed
|
|
in, starting with 0. This is useful if you want to measure some mutating code,
|
|
for example. The number of runs can be known beforehand by calling
|
|
`Catch::Benchmark::Chronometer::runs`; with this one can set up a different instance to be
|
|
mutated by each run.
|
|
|
|
```c++
|
|
std::vector<std::string> v(meter.runs());
|
|
std::fill(v.begin(), v.end(), test_string());
|
|
meter.measure([&v](int i) { in_place_escape(v[i]); });
|
|
```
|
|
|
|
Note that it is not possible to simply use the same instance for different runs
|
|
and resetting it between each run since that would pollute the measurements with
|
|
the resetting code.
|
|
|
|
It is also possible to just provide an argument name to the simple `BENCHMARK` macro to get
|
|
the same semantics as providing a callable to `meter.measure` with `int` argument:
|
|
|
|
```c++
|
|
BENCHMARK("indexed", i){ return long_computation(i); };
|
|
```
|
|
|
|
### Constructors and destructors
|
|
|
|
All of these tools give you a lot mileage, but there are two things that still
|
|
need special handling: constructors and destructors. The problem is that if you
|
|
use automatic objects they get destroyed by the end of the scope, so you end up
|
|
measuring the time for construction and destruction together. And if you use
|
|
dynamic allocation instead, you end up including the time to allocate memory in
|
|
the measurements.
|
|
|
|
To solve this conundrum, Catch provides class templates that let you manually
|
|
construct and destroy objects without dynamic allocation and in a way that lets
|
|
you measure construction and destruction separately.
|
|
|
|
```c++
|
|
BENCHMARK_ADVANCED("construct")(Catch::Benchmark::Chronometer meter) {
|
|
std::vector<Catch::Benchmark::storage_for<std::string>> storage(meter.runs());
|
|
meter.measure([&](int i) { storage[i].construct("thing"); });
|
|
};
|
|
|
|
BENCHMARK_ADVANCED("destroy")(Catch::Benchmark::Chronometer meter) {
|
|
std::vector<Catch::Benchmark::destructable_object<std::string>> storage(meter.runs());
|
|
for(auto&& o : storage)
|
|
o.construct("thing");
|
|
meter.measure([&](int i) { storage[i].destruct(); });
|
|
};
|
|
```
|
|
|
|
`Catch::Benchmark::storage_for<T>` objects are just pieces of raw storage suitable for `T`
|
|
objects. You can use the `Catch::Benchmark::storage_for::construct` member function to call a constructor and
|
|
create an object in that storage. So if you want to measure the time it takes
|
|
for a certain constructor to run, you can just measure the time it takes to run
|
|
this function.
|
|
|
|
When the lifetime of a `Catch::Benchmark::storage_for<T>` object ends, if an actual object was
|
|
constructed there it will be automatically destroyed, so nothing leaks.
|
|
|
|
If you want to measure a destructor, though, we need to use
|
|
`Catch::Benchmark::destructable_object<T>`. These objects are similar to
|
|
`Catch::Benchmark::storage_for<T>` in that construction of the `T` object is manual, but
|
|
it does not destroy anything automatically. Instead, you are required to call
|
|
the `Catch::Benchmark::destructable_object::destruct` member function, which is what you
|
|
can use to measure the destruction time.
|
|
|
|
### The optimizer
|
|
|
|
Sometimes the optimizer will optimize away the very code that you want to
|
|
measure. There are several ways to use results that will prevent the optimiser
|
|
from removing them. You can use the `volatile` keyword, or you can output the
|
|
value to standard output or to a file, both of which force the program to
|
|
actually generate the value somehow.
|
|
|
|
Catch adds a third option. The values returned by any function provided as user
|
|
code are guaranteed to be evaluated and not optimised out. This means that if
|
|
your user code consists of computing a certain value, you don't need to bother
|
|
with using `volatile` or forcing output. Just `return` it from the function.
|
|
That helps with keeping the code in a natural fashion.
|
|
|
|
Here's an example:
|
|
|
|
```c++
|
|
// may measure nothing at all by skipping the long calculation since its
|
|
// result is not used
|
|
BENCHMARK("no return"){ long_calculation(); };
|
|
|
|
// the result of long_calculation() is guaranteed to be computed somehow
|
|
BENCHMARK("with return"){ return long_calculation(); };
|
|
```
|
|
|
|
However, there's no other form of control over the optimizer whatsoever. It is
|
|
up to you to write a benchmark that actually measures what you want and doesn't
|
|
just measure the time to do a whole bunch of nothing.
|
|
|
|
To sum up, there are two simple rules: whatever you would do in handwritten code
|
|
to control optimization still works in Catch; and Catch makes return values
|
|
from user code into observable effects that can't be optimized away.
|
|
|
|
<i>Adapted from nonius' documentation.</i>
|