TY - JOUR
T1 - Elasticity in cloud databases and their query processing
AU - Graefe, Goetz
AU - Nica, Anisoara
AU - Stolze, Knut
AU - Neumann, Thomas
AU - Eavis, Todd
AU - Petrov, Ilia
AU - Pourabbas, Elaheh
AU - Fekete, David
PY - 2013/4
Y1 - 2013/4
N2 - A central promise of cloud services is elastic, on-demand provisioning. The provisioning of data on temporarily available nodes is what makes elastic database services a hard problem. The essential task that enables elastic data services is bringing a node and its data up-to-date. Strategies for high availability do not satisfy the need in this context because they bring nodes online and up-to-date by repeating history, e.g., by log shipping. Nodes must become up-to-date and useful for query processing incrementally by key range. What is wanted is a technique such that in a newly added node, during each short period of time, an additional small key range becomes up-to-date, until eventually the entire dataset becomes up-to-date and useful for query processing, with overall update performance comparable to a traditional high-availability strategy that carries the entire dataset forward without regard to key ranges. Even without the entire dataset being available, the node is productive and participates in query processing tasks. The authors' proposed solution relies on techniques from partitioned B-trees, adaptive merging, deferred maintenance of secondary indexes and of materialized views, and query optimization using materialized views. The paper introduces a family of maintenance strategies for temporarily available copies, the space of possible query execution plans and their cost functions, as well as appropriate query optimization techniques.
AB - A central promise of cloud services is elastic, on-demand provisioning. The provisioning of data on temporarily available nodes is what makes elastic database services a hard problem. The essential task that enables elastic data services is bringing a node and its data up-to-date. Strategies for high availability do not satisfy the need in this context because they bring nodes online and up-to-date by repeating history, e.g., by log shipping. Nodes must become up-to-date and useful for query processing incrementally by key range. What is wanted is a technique such that in a newly added node, during each short period of time, an additional small key range becomes up-to-date, until eventually the entire dataset becomes up-to-date and useful for query processing, with overall update performance comparable to a traditional high-availability strategy that carries the entire dataset forward without regard to key ranges. Even without the entire dataset being available, the node is productive and participates in query processing tasks. The authors' proposed solution relies on techniques from partitioned B-trees, adaptive merging, deferred maintenance of secondary indexes and of materialized views, and query optimization using materialized views. The paper introduces a family of maintenance strategies for temporarily available copies, the space of possible query execution plans and their cost functions, as well as appropriate query optimization techniques.
KW - Adaptive Merging
KW - Cloud Services
KW - Data Management
KW - Key Range
KW - Nodes
KW - Partitioned B-Trees
KW - Query Optimization Techniques
KW - Query Processing
UR - http://www.scopus.com/inward/record.url?scp=84887425660&partnerID=8YFLogxK
U2 - 10.4018/jdwm.2013040101
DO - 10.4018/jdwm.2013040101
M3 - Article
AN - SCOPUS:84887425660
SN - 1548-3924
VL - 9
SP - 1
EP - 20
JO - International Journal of Data Warehousing and Mining
JF - International Journal of Data Warehousing and Mining
IS - 2
ER -