Abstract
Most database systems delegate scheduling decisions to the operating system. While such an approach simplifies the overall database design, it also entails problems. Adaptive resource allocation becomes hard in the face of concurrent queries. Furthermore, incorporating domain knowledge to improve query scheduling is difficult. To mitigate these problems, many modern systems employ forms of task-based parallelism. The execution of a single query is broken up into small, independent chunks of work (tasks). Now, fine-grained scheduling decisions based on these tasks are the responsibility of the database system. Despite being commonplace, little work has focused on the opportunities arising from this execution model. In this paper, we show how task-based scheduling in database systems opens up new areas for optimization. We present a novel lock-free, self-tuning stride scheduler that optimizes query latencies for analytical workloads. By adaptively managing query priorities and task granularity, we provide high scheduling elasticity. By incorporating domain knowledge into the scheduling decisions, our system is able to cope with workloads that other systems struggle with. Even at high load, we retain near optimal latencies for short running queries. Compared to traditional database systems, our design often improves tail latencies by more than 10x.
Original language | English |
---|---|
Pages (from-to) | 1879-1891 |
Number of pages | 13 |
Journal | Proceedings of the ACM SIGMOD International Conference on Management of Data |
DOIs | |
State | Published - 2021 |
Event | 2021 International Conference on Management of Data, SIGMOD 2021 - Virtual, Online, China Duration: 20 Jun 2021 → 25 Jun 2021 |
Keywords
- database systems
- parallelism
- query scheduling
- self-tuning