Welcome to DBTest

Welcome

About DBTest

With the ever increasing amount of data stored and processed, there is an ongoing need of testing database management systems but also data-intensive systems in general. Specifically, emerging new technologies such as Non-Volatile Memory impose new challenges (e.g., avoiding persistent memory leaks and partial writes), and novel system designs including FPGAs, GPUs, and RDMA call for additional attention and sophistication.

Reviving the previous success of the eight previous workshops, the goal of DBTest 2022 is to bring researchers and practitioners from academia and industry together to discuss key problems and ideas related to testing database systems and applications. The long-term objective is to reduce the cost and time required to test and tune data management and processing products so that users and vendors can spend more time and energy on actual innovations.

Proceedings

Papers

DBTest '22: Proceedings of the 2022 workshop on 9th International Workshop of Testing Database Systems

DBTest '22: Proceedings of the 2022 workshop on 9th International Workshop of Testing Database Systems

Full Citation in the ACM Digital Library

DeepBench: Benchmarking JSON Document Stores

  • Stefano Belloni
  • Daniel Ritter
  • Marco Schröder
  • Nils Rörup

The growing popularity of JSON as exchange and storage format in business and analytical applications led to its rapid dissemination, thus making a timely storage and processing of JSON documents crucial for organizations. Consequently, specialized JSON document stores are ubiquitously used for diverse domain-specific workloads, while a JSON-specific benchmark is missing.

In this work, we specify DeepBench, an extensible, scalable benchmark that addresses nested JSON data, as well as queries over JSON documents. DeepBench features configurable domain-independent (e. g., varying document sizes, concurrent users) and JSON-specific scale levels (e. g., object, array nesting). The evaluation of well-known document stores with a prototypical DeepBench implementation shows its versatility and gives new insights into potential weaknesses that were not found by existing, non-JSON benchmarks

Journey of Migrating Millions of Queries on The Cloud

  • Taro L. Saito
  • Naoki Takezoe
  • Yukihiro Okada
  • Takako Shimamoto
  • Dongmin Yu
  • Suprith Chandrashekharachar
  • Kai Sasaki
  • Shohei Okumiya
  • Yan Wang
  • Takashi Kurihara
  • Ryu Kobayashi
  • Keisuke Suzuki
  • Zhenghong Yang
  • Makoto Onizuka

Treasure Data is processing millions of distributed SQL queries every day on the cloud. Upgrading the query engine service at this scale is challenging because we need to migrate all of the production queries of the customers to a new version while preserving the correctness and performance of the data processing pipelines. To ensure the quality of the query engines, we utilize our query logs to build customer-specific benchmarks and replay these queries with real customer data in a secure pre-production environment. To simulate millions of queries, we need effective minimization of test query sets and better reporting of the simulation results to proactively find incompatible changes and performance regression of the new version. This paper describes the overall design of our system and shares various challenges in maintaining the quality of the query engine service on the cloud.

FuzzyData: A Scalable Workload Generator for Testing Dataframe Workflow Systems

  • Mohammed Suhail Rehman
  • Aaron Elmore

Dataframes have become a popular means to represent, transform and analyze data. This approach has gained traction and a large user base for data science practitioners - resulting in a new wave of systems that implement a dataframe API but allow for performance, efficiency, and distributed/parallel extensions to systems such as R and pandas. However, unlike relational databases and NoSQL systems with a variety of benchmarking, testing, and workload generation suites, there is an acute lack of similar tools for dataframe-based systems. This paper presents fuzzydata, a first step in providing an extensible workflow generation system that targets dataframe-based APIs. We present an abstract data processing workflow model, random table and workflow generators, and three clients implemented using our model. Using fuzzydata, we can encode a real-world workflow or randomly generate workflows using various parameters. These workflows can be scaled and replayed on multiple systems to provide stress testing, performance evaluation, and a breakdown of performance bottlenecks present on popular dataframe systems.

Topics Of Interest

  • Testing of database systems, storage services, and database applications
  • Testing of database systems using novel hardware and software technology (non-volatile memory, hardware transactional memory, …)
  • Testing heterogeneous systems with hardware accelerators (GPUs, FPGAs, ASICs, …)
  • Testing distributed and big data systems
  • Testing machine learning systems
  • Specific challenges of testing and quality assurance for cloud-based systems
  • War stories and lessons learned
  • Performance and scalability testing
  • Testing the reliability and availability of database systems
  • Algorithms and techniques for automatic program verification
  • Maximizing code coverage during testing of database systems and applications
  • Generation of synthetic data for test databases
  • Testing the effectiveness of adaptive policies and components
  • Tools for analyzing database management systems (e.g., profilers, debuggers)
  • Workload characterization with respect to performance metrics and engine components
  • Metrics for test quality, robustness, efficiency, and effectiveness
  • Operational aspects such as continuous integration and delivery pipelines
  • Security and vulnerability testing
  • Experimental reproduction of benchmark results
  • Functional and performance testing of interactive data exploration systems
  • Tracability, reproducibility and reasoning for ML-based systems

Details

Paper Submission

Authors are invited to submit original, unpublished research papers that are not being considered for publication in any other forum.

Submission Guideline

HotCRP

Timeline

March 11, 2022 / 11:59PM US PST

Paper Submission

April 4, 2022 / 11:59PM US PST

Notification Of Outcome

April 17, 2022 / 11:59PM US PST

Camera-Ready Copy
Schedule

Program Schedule and Recordings

DBTest will be held as a hybrid workshop this year. While there will be a physical presence, we will also use the Zoom video conferencing platform to stream the presentations, and have an interactive discussion about the papers and topics presented at the workshop.

The program for this year features two keynotes, three full papers, one short presentation, and a panel discussion, and is structured as follows (all times are EST):

Start Time (EST) Title Presenter Mode Recording Slides
9:30 AMWelcomeManuel Rigger and Pınar TözünRemoteYouTube
9:45 AMJourney of Migrating Millions of Queries on The CloudT. Saito, N. Takezoe, Y. Okada, T. Shimamoto, D. Yu, S. Chandrashekharachar, K. Sasaki, S. Okumiya, Y. Wang, T. Kurihara, R. Kobayashi, K. Suzuki, Z. Yang, and M. OnizukaRemoteYouTubeSlides
10:15 AMBenchbot: Benchmark as a Service for TiDBYuying Song and Huansheng ChenRemoteYouTubeSlides
10:30 AMBreak
11:00 AMKeynote 1: DuckDB Testing - Present and FutureMark RaasveldtRemoteYouTubeSlides
12:00 AMDeepBench – Benchmarking JSON Document StoresStefano Belloni, Daniel Ritter, Marco Schröder, Nils RörupRemoteYouTubeSlides
12:30 AMLunch
2:00 PMKeynote 2: Tackling performance and correctness problems in database-backed web applicationsShan LuRemoteYouTubeSlides
3:00 PMFuzzyData: A Scalable Workload Generator for Testing Dataframe Workflow SystemsMohammed Suhail Rehman, Aaron ElmoreIn-personYouTubeSlides
3:30 PMBreak (Poster Session)
4:30 PMPanel DiscussionGreg Law, Allison Lee, Abdul Quamar, Yingjun WuHybridYouTube
5:25 PMClosingManuel Rigger and Pınar TözünRemote

Organization

Program Committee

Anisoara Nica (SAP SE)
Anja Grünheid (Microsoft)
Chee-Yong Chan (National University of Singapore)
Danica Porobic (Oracle)
Daniel Ritter (HPI)
Jayant R. Haritsa (Indian Institute of Science)
Joy Aruljay (Georgia Tech)
Junwen Yang (University of Chicago)
Muhammad Ali Gulzar (Virginia Tech)
Numair Mansur (MPI-SWS)
Renata Borovica-Gajic (University of Melbourne)
Shuai Wang (HKUST)
S. Sudarshan (IIT Bombay)
Stefania Dumbrava (ENSIIE)
Utku Sirin (Harvard University)

Workshop Co-Chairs

Manuel
Rigger

ETH Zurich, Switzerland

Pinar
Tözün

ITU Copenhagen, Denmark

Steering Comitee

Carsten
Binnig

TU Darmstadt, Germany

Alexander
Böhm

Google, Germany

Tilmann
Rabl

TU Berlin, Germany