Welcome to DBTest


About DBTest

With the ever increasing amount of data stored and processed, there is an ongoing need of testing database management systems but also data-intensive systems in general. Specifically, emerging new technologies such as Non-Volatile Memory impose new challenges (e.g., avoiding persistent memory leaks and partial writes), and novel system designs including FPGAs, GPUs, and RDMA call for additional attention and sophistication.

Reviving the previous success of the seven previous workshops, the goal of DBTest 2020 is to bring researchers and practitioners from academia and industry together to discuss key problems and ideas related to testing database systems and applications. The long-term objective is to reduce the cost and time required to test and tune data management and processing products so that users and vendors can spend more time and energy on actual innovations.

Topics Of Interest

  • Testing of database systems, storage services, and database applications
  • Testing of database systems using novel hardware and software technology (non-volatile memory, hardware transactional memory, …)
  • Testing heterogeneous systems with hardware accelerators (GPUs, FPGAs, ASICs, …)
  • Testing distributed and big data systems
  • Testing machine learning systems
  • Specific challenges of testing and quality assurance for cloud-based systems
  • War stories and lessons learned
  • Performance and scalability testing
  • Testing the reliability and availability of database systems
  • Algorithms and techniques for automatic program verification
  • Maximizing code coverage during testing of database systems and applications
  • Generation of synthetic data for test databases
  • Testing the effectiveness of adaptive policies and components
  • Tools for analyzing database management systems (e.g., profilers, debuggers)
  • Workload characterization with respect to performance metrics and engine components
  • Metrics for test quality, robustness, efficiency, and effectiveness
  • Operational aspects such as continuous integration and delivery pipelines
  • Security and vulnerability testing
  • Experimental reproduction of benchmark results
  • Functional and performance testing of interactive data exploration systems
  • Tracability, reproducibility and reasoning for ML-based systems


Program Schedule and Recordings

As most workshops associated with this year's SIGMOD (and SIGMOD 2020 itself), DBTest will be a virtual workshop. We will use the Zoom video conferencing platform to stream the presentations, and have an interactive discussion about the papers and topics presented at the workshop.

The program for this year features three keynotes, five full papers and two short papers, and is structured as follows (all times are PST):

Title Presenter Recording Slides
Keynote 1: From HyPer to Hyper: Integrating an academic DBMS into a leading analytics and business intelligence platformTobias Mühlbauer and Jan Finishttps://youtu.be/iAlkUq411mcSlides
SparkFuzz: Searching Correctness Regressions in Modern Query EnginesBogdan Ghit, Nicolas Poggi, Josh Rosen, Reynold Xin and Peter Bonczhttps://youtu.be/l90E5EUCKQASlides
On Another Level: How to Debug Compiling Query EnginesTimo Kersten and Thomas Neumannhttps://youtu.be/vIcoM2cyLKsSlides
Keynote 2: Benchmark(et)ing an LSM-Tree vs a B-TreeMark Callaghanhttps://youtu.be/vyLZHx9aZQcSlides
Automated System Performance Testing at MongoDBHenrik Ingo and David Dalyhttps://youtu.be/FI5BrYkWNvgSlides
CoreBigBench: Benchmarking Big Data Core OperationsTodor Ivanov, Ahmad Ghazal, Alain Crolotte, Pekka Kostamaa and Yoseph Ghazalhttps://youtu.be/V8SdCOl7VyMSlides
FacetE: Exploiting Web Tables for Domain-Specific Word Embedding EvaluationMichael Günther, Paul Sikorski, Maik Thiele and Wolfgang Lehnerhttps://youtu.be/UE3ZW5HIGDwSlides
Keynote 3: How to clear your backlog of failing tests and make your test suite All GreenGreg Lawhttps://youtu.be/WH5Gu9RC6to
Testing Query Execution Engines with MutationsXinyue Chen, Chenglong Wang and Alvin CheungSlides
Workload Merging Potential in SAP HybrisRobin Rehrmann, Martin Keppner, Wolfgang Lehner, Carsten Binnig and Arne Schwarzhttps://youtu.be/eUfnuvsy-MwSlides

We are still in the process of collecting and uploading the remaining videos and slides that are not yet available above, please stay tuned.



DBTest '20: Proceedings of the Workshop on Testing Database Systems

Full Citation in the ACM Digital Library

SparkFuzz: searching correctness regressions in modern query engines

  • Bogdan Ghit
  • Nicolas Poggi
  • Josh Rosen
  • Reynold Xin
  • Peter Boncz

With more than 1200 contributors, Apache Spark is one of the most actively developed open source projects. At this scale and pace of development, mistakes are bound to happen. In this paper we present SparkFuzz, a toolkit we developed at Databricks for uncovering correctness errors in the Spark SQL engine. To guard the system against correctness errors, SparkFuzz takes a fuzzing approach to testing by generating random data and queries. Spark-Fuzz executes the generated queries on a reference database system such as PostgreSQL which is then used as a test oracle to verify the results returned by Spark SQL. We explain the approach we take to data and query generation and we analyze the coverage of SparkFuzz. We show that SparkFuzz achieves its current maximum coverage relatively fast by generating a small number of queries.

On another level: how to debug compiling query engines

  • Timo Kersten
  • Thomas Neumann

Compilation-based query engines generate and compile code at runtime, which is then run to get the query result. In this process there are two levels of source code involved: The code of the code generator itself and the code that is generated at runtime. This can make debugging quite indirect, as a fault in the generated code was caused by an error in the generator. To find the error, we have to look at both, the generated code and the code that generated it. Current debugging technology is not equipped to handle this situation. For example, GNU's gdb only offers facilities to inspect one source line, but not multiple source levels. Also, current debuggers are not able to reconstruct additional program state for further source levels, thus, context is missing during debugging. In this paper, we show how to build a multi-level debugger for generated queries that solves these issues.We propose to use a timetravelling debugger to provide context information for compile-time and runtime, thus providing full interactive debugging capabilities for every source level.We also present how to build such a debugger with low engineering effort by combining existing tool chains.

Automated system performance testing at MongoDB

  • Henrik Ingo
  • David Daly

Distributed Systems Infrastructure (DSI) is MongoDB's framework for running fully automated system performance tests in our Continuous Integration (CI) environment. To run in CI it needs to automate everything end-to-end: provisioning and deploying multinode clusters, executing tests, tuning the system for repeatable results, and collecting and analyzing the results. Today DSI is MongoDB's most used and most useful performance testing tool. It runs almost 200 different benchmarks in daily CI, and we also use it for manual performance investigations. As we can alert the responsible engineer in a timely fashion, all but one of the major regressions were fixed before the 4.2.0 release. We are also able to catch net new improvements, of which DSI caught 17. We open sourced DSI in March 2020.

CoreBigBench: Benchmarking big data core operations

  • Todor Ivanov
  • Ahmad Ghazal
  • Alain Crolotte
  • Pekka Kostamaa
  • Yoseph Ghazal

Significant effort was put into big data benchmarking with focus on end-to-end applications. While covering basic functionalities implicitly, the details of the individual contributions to the overall performance are hidden. As a result, end-to-end benchmarks could be biased toward certain basic functions. Micro-benchmarks are more explicit at covering basic functionalities but they are usually targeted at some highly specialized functions. In this paper we present CoreBigBench, a benchmark that focuses on the most common big data engines/platforms functionalities like scans, two way joins, common UDF execution and more. These common functionalities are benchmarked over relational and key-value data models which covers majority of data models. The benchmark consists of 22 queries applied to sales data and key-value web logs covering the basic functionalities. We ran CoreBigBench on Hive as a proof of concept and verified that the benchmark is easy to deploy and collected performance data. Finally, we believe that CoreBigBench is a good fit for commercial big data engines performance testing focused on basic engine functionalities not covered in end-to-end benchmarks.

FacetE: exploiting web tables for domain-specific word embedding evaluation

  • Michael Günther
  • Paul Sikorski
  • Maik Thiele
  • Wolfgang Lehner

Today's natural language processing and information retrieval systems heavily depend on word embedding techniques to represent text values. However, given a specific task deciding for a word embedding dataset is not trivial. Current word embedding evaluation methods mostly provide only a one-dimensional quality measure, which does not express how knowledge from different domains is represented in the word embedding models. To overcome this limitation, we provide a new evaluation data set called FacetE derived from 125M Web tables, enabling domain-sensitive evaluation. We show that FacetE can effectively be used to evaluate word embedding models. The evaluation of common general-purpose word embedding models suggests that there is currently no best word embedding for every domain.

Testing query execution engines with mutations

  • Xinyue Chen
  • Chenglong Wang
  • Alvin Cheung

Query optimizer engine plays an important role in modern database systems. However, due to the complex nature of query optimizers, validating the correctness of a query execution engine is inherently challenging. In particular, the high cost of testing query execution engines often prevents developers from making fast iteration during the development process, which can increase the development cycle or lead to production-level bugs. To address this challenge, we propose a tool, MutaSQL, that can quickly discover correctness bugs in SQL execution engines. MutaSQL generates test cases by mutating a query Q over database D into a query Q′ that should evaluate to the same result as Q on D. MutaSQL then checks the execution results of Q′ and Q on the tested engine. We evaluated MutaSQL on previous SQLite versions with known bugs as well as the newest SQLite release. The result shows that MutaSQL can effectively reproduce 34 bugs in previous versions and discover a new bug in the current SQLite release.

Workload merging potential in SAP Hybris

  • Robin Rehrmann
  • Martin Keppner
  • Wolfgang Lehner
  • Carsten Binnig
  • Arne Schwarz

OLTP DBMSs in enterprise scenarios are often facing the challenge to deal with workload peaks resulting from events such as Cyber Monday or Black Friday. The traditional solution to prevent running out of resources and thus coping with such workload peaks is to use a significant over-provisioning of the underlying infrastructure. Another direction to cope with such peak scenarios is to apply resource sharing. In a recent work, we showed that merging read statements in OLTP scenarios offers the opportunity to maintain low latency for systems under heavy load without over-provisioning.

In this paper, we analyze a real enterprise OLTP workload --- SAP Hybris --- with respect to statements types, complexity, and hot-spot statements to find potential candidates for workload sharing in OLTP. We additionally share work of the Hybris workload in our system OLTPShare and report on savings with respect to CPU consumption. Another interesting effect we show is that with OLTPShare, we can increase the SAP Hybris throughput by 20%.


Programm Committee

Anastasia Ailamaki, EPFL, Switzerland
Anisoara Nica, SAP SE, Canada
Anja Grünheid, Google, USA
Artur Andrzejak, University of Heidelberg, Germany
Caetano Sauer, Tableau, Germany
Danica Porobic, Oracle, USA
Eric Lo, Chinese University of Hong Kong,Hong Kong
Eva Sitaridi, Amazon Web Services,USA
Hannes Mühleisen, CWI, The Netherlands
Ioana Manolescu, INRIA, France
Joy Aruljay,Georgia Tech, USA
Julia Stoyanovich, NYU, USA
Leo Giakoumakis, Snowflake, USA
Renata Borovica-Gajic, University of Melbourne, Australia
Wolfram Wingerath, Baqend, Germany
Zsolt Istvan, IMDEA, Spain

Workshop Co-Chairs


ITU Copenhagen, Denmark


SAP SE, Germany

Steering Comitee


TU Darmstadt, Germany


SAP SE, Germany


TU Berlin, Germany