Skip to content

Performance

Performance

The performance of the package can be compared with the default implementation of numpy, the intersect1d` method. The ratio of the execution time between sortednp and numpy is shown for various different benchmark tests.

The merge or intersect time can be estimated under two different assumptions. If the arrays, which are merged or intersected, are already sorted, one should not consider the time it takes to sort the random arrays in the benchmark. On the other hand, if one considers a scenario in which the arrays are not sorted, one should take the sorting time into account. The benchmarks here on this page, assume that the arrays are already sorted. If you would like to benchmark the package and include the sorting time, have a look at the methods defined in ci/benchmark.py.

The random scattering of the points indicates the uncertainty caused by random load fluctuations on the benchmark machine (Spikes of serveral orders of magnitude usualy mean that there was a shortage of memory and large chunks had to be moved to SWAP.)

Intersect

The performance of the intersection operation depends on the sparseness of the two arrays. For example, if the first element of one of the arrays is larger than all elements in the other array, only the other array has to be searched (linearly, binarily, or exponentially). Similarly, if the common elements are far apart in the arrays (sparseness), large chunks of the arrays can be skipped. The arrays in the benchmark contain random (unique) integers. The sparseness is defined as the average difference between two consecutive elements in one array.

The first set of tests studies the performance dependence on the size of the arrays. The second set of tests studies the dependence on the sparseness of the array for a fixed size of array. Every shows a color-coded comparison of the performance of intersecting more than two arrays.

Test Simple Search Binary Search Galloping Search
Intersect
Sparseness

Merge

The following chart shows the performance of merging 2 or more arrays as a function of the array size. It is assumed that the arrays are already sorted.