Abstract
Developers often prefer flexibility over upfront schema design, making semi-structured data formats such as JSON increasingly popular. Large amounts of JSON data are therefore stored and analyzed by relational database systems. In existing systems, however, JSON's lack of a fixed schema results in slow analytics. In this paper, we present JSON tiles, which, without losing the flexibility of JSON, enables relational systems to perform analytics on JSON data at native speed. JSON tiles automatically detects the most important keys and extracts them transparently - often achieving scan performance similar to columnar storage. At the same time, JSON tiles is capable of handling heterogeneous and changing data. Furthermore, we automatically collect statistics that enable the query optimizer to find good execution plans. Our experimental evaluation compares against state-of-the-art systems and research proposals and shows that our approach is both robust and efficient.
Original language | English |
---|---|
Pages (from-to) | 445-458 |
Number of pages | 14 |
Journal | Proceedings of the ACM SIGMOD International Conference on Management of Data |
DOIs | |
State | Published - 2021 |
Event | 2021 International Conference on Management of Data, SIGMOD 2021 - Virtual, Online, China Duration: 20 Jun 2021 → 25 Jun 2021 |
Keywords
- analytics
- json
- jsonb
- olap
- scan
- semi-structured data
- storage