Checking bounds checking in Julia

Ever since my knapsack benchmark I’ve been curious why Julia was still \(\approx 50\%\) slower than either Java or C++ on what essentially were a couple of nested for-loops over a pair of arrays. I’ve done some exploring and now believe that my first guess was correct: much of the remaining Julia overhead is due to its bounds checking.

Eliding bounds checking unconditionally with @inbounds

My first test was to isolate all opt_value() lines with array access into a begin...end block and annotate it with @inbounds:

    function opt_value(W ::Int64, items ::Array{Item}) ::Int64
    ...
    @inbounds(
      begin
          V[items[1].weight:end] .= items[1].value
          ...
          for j in 2 : n
              V, V_prev = V_prev, V
              itemⱼ = items[j]
              for w in 1 : W
                  V_without_itemⱼ = V_prev[w]
              ...
              end
          end
      end
    )
    ...
Here is the data from the original benchmark with added new timing measurements of the above version, labeled julia.ib:
Calculation time as a function of problem size. **julia.ib** labels data obtained from an @inbounds-annotated Julia implementation.

Figure 1: Calculation time as a function of problem size. julia.ib labels data obtained from an @inbounds-annotated Julia implementation.

Nice! With bounds checks elided Julia performance is now in line with Java and C++: its underperformance has decreased to a maximum of \(\approx 15\%\). And Julia is now even the champion for small problem sizes.

Peeking at differences in generated code via @code_native

Another neat facility of Julia allows me to examine JIT-compiled code very easily and right from my REPL session:

julia> d = Knapsack.make_random_data(5_000, 12345)
(5000, Main.Knapsack.Item[Main.Knapsack.Item(609, 6198), ...

julia> @code_native debuginfo=:none Knapsack.opt_value(d[1],d[2])
    .text
    pushq   %rbp
    movq    %rsp, %rbp
    ...
L1165:
    movl    $2, %edx
    jmp L825
    nopw    (%rax,%rax)

I don’t show the full asm listing because that’s not particularly useful unless you’re a low-level programmer. Comparing it with and without @inbounds does confirm, however, that the former version is shorter: about 200 instructions instead of 320.

It is more instructive to look at the compiled version(s) of a simpler function that simply extracts a single slot from an input array:

function getit(a ::Array{Int64})
    return a[123]
end
julia> @code_native getit(rand(Int64, 1000))
    .text
; ┌ @ work.jl:185 within `getit'
; │┌ @ work.jl:185 within `getindex'
    cmpq    $122, 8(%rdi)
    jbe L18
    movq    (%rdi), %rax
    movq    976(%rax), %rax
; │└
    retq
L18:
    pushq   %rbp
    movq    %rsp, %rbp
; │ @ work.jl:185 within `getit'
; │┌ @ array.jl:728 within `getindex'
    movq    %rsp, %rax
    leaq    -16(%rax), %rsi
    movq    %rsi, %rsp
    movq    $123, -16(%rax)
    movabsq $jl_bounds_error_ints, %rax
    movl    $1, %edx
    callq   *%rax
    nopl    (%rax)
; └└

Note the instructions to check the fixed (and 0-based) index against the array input length (cmpq $122, 8(%rdi)) followed by a conditional jump to native C routine jl_bounds_error_ints() that creates and throws a BoundsError exception object. The following version of getit(), with the only array indexing expression annotated with @inbounds, is much shorter:

function getit(a ::Array{Int64})
    return @inbounds a[123]
end
julia> @code_native getit(rand(Int64, 1000))
    .text
; ┌ @ work.jl:185 within `getit'
; │┌ @ work.jl:185 within `getindex'
    movq    (%rdi), %rax
    movq    976(%rax), %rax
; │└
    retq
    nopl    (%rax,%rax)
; └

Disabling bounds checking from command line

It seems that I can get disable bounds checking summarily using –check-bounds=no option:

>julia --check-bounds=no Knapsack.jl

The documentation is somewhat sparse, but it appears that when this option is used it will completely override all @inbounds annotations in the source code, i.e. either all checks are done in a mandatory fashion (“yes”) or none are (“no”). This could be used for “debug” vs “release” calculation runs. However, since bounds checking affects code as it’s being generated, some more documentation is needed to understand how this option works with “pre-compiled” packages and Julia “system” code. I also would like to understand the functionality and use cases for the @propagate_inbounds macro.

Avatar
Vlad Roubtsov
Principal AI Scientist

I am interested in machine learning and stochastic optimization.