mirror of
https://github.com/wezm/wezm.net.git
synced 2024-11-10 01:42:32 +00:00
Add divmod post
This commit is contained in:
parent
7f389d2055
commit
e00bb7868c
2 changed files with 104 additions and 0 deletions
6
v2/content/posts/2023/_index.md
Normal file
6
v2/content/posts/2023/_index.md
Normal file
|
@ -0,0 +1,6 @@
|
||||||
|
+++
|
||||||
|
title = "2023"
|
||||||
|
sort_by = "date"
|
||||||
|
paginate_by = 5
|
||||||
|
transparent = true
|
||||||
|
+++
|
98
v2/content/posts/2023/divmod.md
Normal file
98
v2/content/posts/2023/divmod.md
Normal file
|
@ -0,0 +1,98 @@
|
||||||
|
+++
|
||||||
|
title = "divmod, Rust, x86, and Optimisation"
|
||||||
|
date = 2023-01-11T19:48:09+10:00
|
||||||
|
|
||||||
|
#[extra]
|
||||||
|
#updated = 2022-04-21T09:07:57+10:00
|
||||||
|
+++
|
||||||
|
|
||||||
|
While reviewing some Rust code that did something like this:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
let a = n / d;
|
||||||
|
let b = n % d;
|
||||||
|
```
|
||||||
|
|
||||||
|
I lamented the lack of a `divmod` method in Rust (that would return both the
|
||||||
|
quotient and remainder). My colleague [Brendan] pointed out that he actually
|
||||||
|
[added it][rust-div-mod] back in 2013 but it was moved out of the standard
|
||||||
|
library before the 1.0 release.
|
||||||
|
|
||||||
|
<!-- more -->
|
||||||
|
|
||||||
|
I also learned that the [`div` instruction on x86][div] provides the remainder
|
||||||
|
so there is potentially some benefit to combining the operation. I suspected
|
||||||
|
that LLVM was probably able to optimise the separate operations and a trip to
|
||||||
|
[the Compiler Explorer][compiler-explorer] confirmed it.
|
||||||
|
|
||||||
|
This function:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub fn divmod(n: usize, d: usize) -> (usize, usize) {
|
||||||
|
(n / d, n % d)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Compiles to the following assembly, which I have annotated with my
|
||||||
|
understanding of each line (Note: I'm still learning x86 assembly):
|
||||||
|
|
||||||
|
```asm
|
||||||
|
; rdi = numerator, rsi = denominator
|
||||||
|
example::divmod:
|
||||||
|
test rsi, rsi ; check for denominator of zero
|
||||||
|
je .LBB0_5 ; jump to div zero panic if zero
|
||||||
|
mov rax, rdi ; load rax with numerator
|
||||||
|
or rax, rsi ; or rax with denominator
|
||||||
|
shr rax, 32 ; shift rax right 32-bits
|
||||||
|
je .LBB0_2 ; if the result of the shift sets the zero flag then numerator and
|
||||||
|
; denominator are 32-bit since none of the upper 32-bits are set.
|
||||||
|
; jump to 32-bit division implementation
|
||||||
|
mov rax, rdi ; move numerator into rax
|
||||||
|
xor edx, edx ; zero edx (I'm not sure why, might be relevant to the calling
|
||||||
|
; convention and is used by the caller?)
|
||||||
|
div rsi ; divide rax by rsi
|
||||||
|
ret ; return, quotient is in rax, remainder in rdx
|
||||||
|
|
||||||
|
; 32 bit implementation
|
||||||
|
.LBB0_2:
|
||||||
|
mov eax, edi ; move edi to eax (32-bit regs)
|
||||||
|
xor edx, edx ; zero edx
|
||||||
|
div esi ; divide eax by esi
|
||||||
|
ret
|
||||||
|
|
||||||
|
; div zero panic
|
||||||
|
.LBB0_5:
|
||||||
|
push rax
|
||||||
|
lea rdi, [rip + str.0]
|
||||||
|
lea rdx, [rip + .L__unnamed_1]
|
||||||
|
mov esi, 25
|
||||||
|
call qword ptr [rip + core::panicking::panic@GOTPCREL]
|
||||||
|
ud2
|
||||||
|
|
||||||
|
.L__unnamed_2:
|
||||||
|
.ascii "/app/example.rs"
|
||||||
|
|
||||||
|
.L__unnamed_1:
|
||||||
|
.quad .L__unnamed_2
|
||||||
|
.asciz "\017\000\000\000\000\000\000\000\002\000\000\000\006\000\000"
|
||||||
|
|
||||||
|
str.0:
|
||||||
|
.ascii "attempt to divide by zero"
|
||||||
|
```
|
||||||
|
|
||||||
|
I found it interesting that after checking for a zero denominator there's an
|
||||||
|
additional check to see if the values fit into 32-bits, and if so it jumps to an
|
||||||
|
instruction sequence that uses 32-bit registers. According to [the testing done
|
||||||
|
in this report][timing] 32-bit `div` has lower latency—particularly on older
|
||||||
|
models.
|
||||||
|
|
||||||
|
I wasn't able to work out why each implementation zeros `edx`. If you know,
|
||||||
|
send me a message and I'll update the post.
|
||||||
|
|
||||||
|
[View the Example on Compiler Explorer](https://rust.godbolt.org/z/hj9rb4Txa)
|
||||||
|
|
||||||
|
[Brendan]: https://github.com/brendanzab
|
||||||
|
[rust-div-mod]: https://github.com/rust-lang/rust/commit/f39152e07baf03fc1ff4c8b2c1678ac857b4a512
|
||||||
|
[div]: https://www.felixcloutier.com/x86/div
|
||||||
|
[compiler-explorer]: https://rust.godbolt.org/
|
||||||
|
[timing]: https://gmplib.org/~tege/x86-timing.pdf
|
Loading…
Reference in a new issue