wezm.net/v2/content/posts/2023/divmod.md at e00bb7868cf06f8887b85469d6946a21cec7ddb9

wezm/wezm.net

Fork 1

mirror of https://github.com/wezm/wezm.net.git synced 2024-11-18 12:52:47 +00:00

Wesley Moore e00bb7868c

Add divmod post

2023-01-11 20:32:56 +10:00

3.4 KiB

Raw Blame History

+++ title = "divmod, Rust, x86, and Optimisation" date = 2023-01-11T19:48:09+10:00

#[extra] #updated = 2022-04-21T09:07:57+10:00 +++

While reviewing some Rust code that did something like this:

let a = n / d;
let b = n % d;

I lamented the lack of a divmod method in Rust (that would return both the quotient and remainder). My colleague Brendan pointed out that he actually added it back in 2013 but it was moved out of the standard library before the 1.0 release.

I also learned that the div instruction on x86 provides the remainder so there is potentially some benefit to combining the operation. I suspected that LLVM was probably able to optimise the separate operations and a trip to the Compiler Explorer confirmed it.

This function:

pub fn divmod(n: usize, d: usize) -> (usize, usize) {
    (n / d, n % d)
}

Compiles to the following assembly, which I have annotated with my understanding of each line (Note: I'm still learning x86 assembly):

; rdi = numerator, rsi = denominator
example::divmod:
        test    rsi, rsi     ; check for denominator of zero
        je      .LBB0_5      ; jump to div zero panic if zero
        mov     rax, rdi     ; load rax with numerator
        or      rax, rsi     ; or rax with denominator
        shr     rax, 32      ; shift rax right 32-bits
        je      .LBB0_2      ; if the result of the shift sets the zero flag then numerator and
                             ; denominator are 32-bit since none of the upper 32-bits are set.
                             ; jump to 32-bit division implementation
        mov     rax, rdi     ; move numerator into rax
        xor     edx, edx     ; zero edx (I'm not sure why, might be relevant to the calling
                             ; convention and is used by the caller?)
        div     rsi          ; divide rax by rsi
        ret                  ; return, quotient is in rax, remainder in rdx

; 32 bit implementation
.LBB0_2:
        mov     eax, edi     ; move edi to eax (32-bit regs)
        xor     edx, edx     ; zero edx
        div     esi          ; divide eax by esi
        ret

; div zero panic
.LBB0_5:
        push    rax
        lea     rdi, [rip + str.0]
        lea     rdx, [rip + .L__unnamed_1]
        mov     esi, 25
        call    qword ptr [rip + core::panicking::panic@GOTPCREL]
        ud2

.L__unnamed_2:
        .ascii  "/app/example.rs"

.L__unnamed_1:
        .quad   .L__unnamed_2
        .asciz  "\017\000\000\000\000\000\000\000\002\000\000\000\006\000\000"

str.0:
        .ascii  "attempt to divide by zero"

I found it interesting that after checking for a zero denominator there's an additional check to see if the values fit into 32-bits, and if so it jumps to an instruction sequence that uses 32-bit registers. According to the testing done in this report 32-bit div has lower latency—particularly on older models.

I wasn't able to work out why each implementation zeros edx. If you know, send me a message and I'll update the post.

View the Example on Compiler Explorer

3.4 KiB Raw Blame History

3.4 KiB

Raw Blame History