JSON Tiles: Fast Analytics on Semi-Structured Data

Dominik Durner, Viktor Leis, Thomas Neumann

Research output: Contribution to journalConference articlepeer-review

21 Scopus citations

Abstract

Developers often prefer flexibility over upfront schema design, making semi-structured data formats such as JSON increasingly popular. Large amounts of JSON data are therefore stored and analyzed by relational database systems. In existing systems, however, JSON's lack of a fixed schema results in slow analytics. In this paper, we present JSON tiles, which, without losing the flexibility of JSON, enables relational systems to perform analytics on JSON data at native speed. JSON tiles automatically detects the most important keys and extracts them transparently - often achieving scan performance similar to columnar storage. At the same time, JSON tiles is capable of handling heterogeneous and changing data. Furthermore, we automatically collect statistics that enable the query optimizer to find good execution plans. Our experimental evaluation compares against state-of-the-art systems and research proposals and shows that our approach is both robust and efficient.

Original languageEnglish
Pages (from-to)445-458
Number of pages14
JournalProceedings of the ACM SIGMOD International Conference on Management of Data
DOIs
StatePublished - 2021
Event2021 International Conference on Management of Data, SIGMOD 2021 - Virtual, Online, China
Duration: 20 Jun 202125 Jun 2021

Keywords

  • analytics
  • json
  • jsonb
  • olap
  • scan
  • semi-structured data
  • storage

Fingerprint

Dive into the research topics of 'JSON Tiles: Fast Analytics on Semi-Structured Data'. Together they form a unique fingerprint.

Cite this