Abstract
Efficient thread synchronization primitives are crucial in modern computer systems for the performant execution of interdependent code segments. In Linux, the futex() syscall is used to construct blocking synchronization primitives such as mutexes or conditional variables. When using futex, the uncontended case is efficiently handled entirely in user space. In the event of contention, the kernel is called to put the waiting thread to sleep until the state of the primitive changes to uncontended. The kernel must be notified of this change by a futex() syscall to wake-up the sleeping thread. This syscall must be issued by the thread that changes the primitive, which is a significant burden on this thread. To remove this burden, we introduce HW-FUTEX to offload the futex wake functionality to a hardware unit (HW Unit) that asynchronously initiates wake-ups of the sleeping threads. This reduces the time required to issue the futex wake functionality by at least 90% to 350 cycles, with no additional overhead in the uncontended case.
Original language | English |
---|---|
Pages (from-to) | 16-29 |
Number of pages | 14 |
Journal | IEEE Transactions on Very Large Scale Integration (VLSI) Systems |
Volume | 32 |
Issue number | 1 |
DOIs | |
State | Published - 1 Jan 2024 |
Keywords
- Futex
- Gem5
- Linux
- hardware
- tracing