Abstract
Fault-tolerance techniques depend on replication to enhance availability, albeit at the cost of increased infrastructure costs. This results in a fundamental trade-off: Fault-tolerant services must satisfy given availability and performance constraints while minimising the number of replicated resources. These constraints pose capacity planning challenges for the service operators to minimise replication costs without negatively impacting availability. To this end, we present PCRAFT (Performant, Cheap, Reliable and Available Fault Tolerance), a practical process to enable capacity planning of dependable services. PCRAFT's capacity planning process is based on a hybrid approach that combines empirical performance measurements with probabilistic modelling of availability based on fault injection. In particular, we integrate traditional service-level availability mechanisms (active-route-anywhere and passive-failover) and deployment schemes (cloud and on-premises) to quantify the number of nodes needed to satisfy the given availability and performance constraints. Our evaluation based on real-world applications shows that cloud deployment requires fewer nodes than on-premises deployments. Additionally, when considering on-premises deployments, we show how passive-failover requires fewer nodes than active-route-anywhere. Furthermore, our evaluation quantifies the quality enhancement given by additional integrity mechanisms and how this affects the number of nodes needed.
Original language | English |
---|---|
Article number | 114126 |
Journal | Theoretical Computer Science |
Volume | 976 |
DOIs | |
State | Published - 17 Oct 2023 |
Keywords
- Capacity planning
- Dependability
- Modelling