Benchmarks

Keywords: benchmarks performance

Packaged with liquid are benchmarks to determine the speed each signal processing element can run on your machine.

Compiling and Running Benchmarks

You can build the benchmark program with make benchmark , and view the execution options with a -u or -h flag for usage/help information:

$ ./benchmark -h
Usage: benchmark [OPTION]
Execute benchmark scripts for liquid-dsp library.
  -h,-u         display this help and exit
  -v            verbose
  -q            quiet
  -e            estimate cpu clock frequency and exit
  -c            set cpu clock frequency (Hz)
  -n[COUNT]     set number of base trials
  -p[ID]        run specific package
  -b[ID]        run specific benchmark
  -t[SECONDS]   set minimum execution time (s)
  -l            list available packages
  -L            list all available scripts
  -s[STRING]    run all scripts matching search string
  -o[FILENAME]  export output

By default, running " make bench " is equivalent to simply executing the ./benchmark program which runs all of the benchmarks sequentially. Initially the tool provides an estimate of the processor's clock frequency; while not necessarily accurate, this is necessary to gauge the relative speed by which the benchmarks will run. The tool will then estimate the number of trials so that each benchmark will take between 50 and 500 ms to run. Listed below is the output of the first several benchmarks:

$ ./benchmark
  estimating cpu clock frequency...
  performed 67108864 trials in 650.0 ms
  estimated clock speed:   2.468 GHz
  setting number of trials to 246754
0: null
    0  : null                  :  23.59 M trials in 220.00 ms (107.212 M t/s,  22.00   cycles/t)
1: agc
    1  : agc_crcf              :   1.92 M trials in 270.00 ms (  7.093 M t/s, 337.50   cycles/t)
    2  : agc_crcf_squelch      :   1.92 M trials in 280.00 ms (  6.840 M t/s, 350.00   cycles/t)
    3  : agc_crcf_locked       :  15.32 M trials in 700.00 ms ( 21.887 M t/s, 109.38   cycles/t)
2: window
    4  : windowcf_n16          :   7.55 M trials in 260.00 ms ( 29.029 M t/s,  81.25   cycles/t)
    5  : windowcf_n32          :   7.55 M trials in 260.00 ms ( 29.029 M t/s,  81.25   cycles/t)
    6  : windowcf_n64          :   7.55 M trials in 270.00 ms ( 27.954 M t/s,  84.38   cycles/t)
    7  : windowcf_n128         :   7.55 M trials in 260.00 ms ( 29.029 M t/s,  81.25   cycles/t)
    8  : windowcf_n256         :   7.55 M trials in 260.00 ms ( 29.029 M t/s,  81.25   cycles/t)
3: dotprod_cccf
    9  : dotprod_cccf_4        :   1.89 M trials in 320.00 ms (  5.897 M t/s, 400.00   cycles/t)
    10 : dotprod_cccf_16       : 471.73 k trials in 320.00 ms (  1.474 M t/s,   1.60 k cycles/t)
    11 : dotprod_cccf_64       : 117.93 k trials in 300.00 ms (393.107 k t/s,   6.00 k cycles/t)
    12 : dotprod_cccf_256      :  29.48 k trials in 300.00 ms ( 98.267 k t/s,  24.00 k cycles/t)
4: dotprod_crcf
    13 : dotprod_crcf_4        :   1.89 M trials in  20.00 ms ( 94.347 M t/s,  25.00   cycles/t)
    14 : dotprod_crcf_16       : 471.73 k trials in  10.00 ms ( 47.173 M t/s,  50.00   cycles/t)
    15 : dotprod_crcf_64       : 117.93 k trials in   0.00 ps (    inf T t/s,   0.00 p cycles/t)
    16 : dotprod_crcf_256      :  29.48 k trials in  20.00 ms (  1.474 M t/s,   1.60 k cycles/t)

For this run the clock speed was estimated to be 2.468 GHz. Benchmarks are sub-divided into packages which group similar signal processing algorithms together. For example, package 3 above refers to benchmarking the dotprod_cccf object which computes the vector dot product between two \(n\) -point arrays of complex floats. Specifically, benchmark 11 refers to the speed of an \(n=64\) -point dot product. In this run the benchmarking tool computed approximately 117,930 64-point complex dot products in 300 ms (about 393,107 trials per second). For the estimated clock rate this means that the algorithm requires approximately 6,000 clock cycles to compute a single 64-point complex vector dot product.

Examples

Run all benchmarks that include the string " dotprod ":

$ ./benchmark -s dotprod
  estimating cpu clock frequency...
  performed 134217728 trials in 912.3 ms
  estimated clock speed:   3.516 GHz
  setting number of base trials to 351599
running all packages and benchmarks matching 'dotprod'...
5: dotprod_cccf
  16 : dotprod_cccf_4    :  70.32 M trials / 407.52 ms (172.55 M t/s,  20.38   c/t)
  17 : dotprod_cccf_16   :  17.58 M trials / 197.83 ms ( 88.86 M t/s,  39.57   c/t)
  18 : dotprod_cccf_64   :   4.39 M trials / 137.22 ms ( 32.03 M t/s, 109.78   c/t)
  19 : dotprod_cccf_256  :   1.10 M trials / 125.15 ms (  8.78 M t/s, 400.48   c/t)
6: dotprod_crcf
  20 : dotprod_crcf_4    :  70.32 M trials / 376.18 ms (186.93 M t/s,  18.81   c/t)
  21 : dotprod_crcf_16   :  17.58 M trials / 178.55 ms ( 98.46 M t/s,  35.71   c/t)
  22 : dotprod_crcf_64   :   8.79 M trials / 149.02 ms ( 58.99 M t/s,  59.61   c/t)
  23 : dotprod_crcf_256  :   2.20 M trials / 109.61 ms ( 20.05 M t/s, 175.38   c/t)
7: dotprod_rrrf
  24 : dotprod_rrrf_4    :  45.00 M trials / 156.26 ms (288.00 M t/s,  12.21   c/t)
  25 : dotprod_rrrf_16   :  22.50 M trials / 123.14 ms (182.74 M t/s,  19.24   c/t)
  26 : dotprod_rrrf_64   :  11.25 M trials / 112.13 ms (100.34 M t/s,  35.04   c/t)
  27 : dotprod_rrrf_256  :   5.63 M trials / 147.95 ms ( 38.02 M t/s,  92.47   c/t)
running all remaining scripts matching 'dotprod'...

Run the benchmarks in package 1 ( agc_crcf ) while specifying a clock rate of 3.1 GHz:

$ ./benchmark -p1 -c3.1e9
  setting number of base trials to 310000
1: agc_crcf
  1  : agc_crcf          :   9.92 M trials / 113.29 ms ( 87.57 M t/s,  35.40   c/t)
  2  : agc_crcf_squelch  :   9.92 M trials / 135.97 ms ( 72.96 M t/s,  42.49   c/t)
  3  : agc_crcf_locked   :  39.68 M trials / 183.53 ms (216.20 M t/s,  14.34   c/t)