Workshop on Database Testing

Welcome

About DBTest

With the ever increasing amount of data stored and processed, there is an ongoing need of testing database management systems but also data-intensive systems in general. Specifically, emerging new technologies such as Non-Volatile Memory impose new challenges (e.g., avoiding persistent memory leaks and partial writes), and novel system designs including FPGAs, GPUs, and RDMA call for additional attention and sophistication.

Reviving the previous success of the eight previous workshops, the goal of DBTest 2022 is to bring researchers and practitioners from academia and industry together to discuss key problems and ideas related to testing database systems and applications. The long-term objective is to reduce the cost and time required to test and tune data management and processing products so that users and vendors can spend more time and energy on actual innovations.

Proceedings

Papers

DBTest '22: Proceedings of the 2022 workshop on 9th International Workshop of Testing Database Systems

Full Citation in the ACM Digital Library

DeepBench: Benchmarking JSON Document Stores

Stefano Belloni
Daniel Ritter
Marco Schröder
Nils Rörup

The growing popularity of JSON as exchange and storage format in business and analytical applications led to its rapid dissemination, thus making a timely storage and processing of JSON documents crucial for organizations. Consequently, specialized JSON document stores are ubiquitously used for diverse domain-specific workloads, while a JSON-specific benchmark is missing.

In this work, we specify DeepBench, an extensible, scalable benchmark that addresses nested JSON data, as well as queries over JSON documents. DeepBench features configurable domain-independent (e. g., varying document sizes, concurrent users) and JSON-specific scale levels (e. g., object, array nesting). The evaluation of well-known document stores with a prototypical DeepBench implementation shows its versatility and gives new insights into potential weaknesses that were not found by existing, non-JSON benchmarks

Journey of Migrating Millions of Queries on The Cloud

Taro L. Saito
Naoki Takezoe
Yukihiro Okada
Takako Shimamoto
Dongmin Yu
Suprith Chandrashekharachar
Kai Sasaki
Shohei Okumiya
Yan Wang
Takashi Kurihara
Ryu Kobayashi
Keisuke Suzuki
Zhenghong Yang
Makoto Onizuka

Treasure Data is processing millions of distributed SQL queries every day on the cloud. Upgrading the query engine service at this scale is challenging because we need to migrate all of the production queries of the customers to a new version while preserving the correctness and performance of the data processing pipelines. To ensure the quality of the query engines, we utilize our query logs to build customer-specific benchmarks and replay these queries with real customer data in a secure pre-production environment. To simulate millions of queries, we need effective minimization of test query sets and better reporting of the simulation results to proactively find incompatible changes and performance regression of the new version. This paper describes the overall design of our system and shares various challenges in maintaining the quality of the query engine service on the cloud.

FuzzyData: A Scalable Workload Generator for Testing Dataframe Workflow Systems

Mohammed Suhail Rehman
Aaron Elmore

Dataframes have become a popular means to represent, transform and analyze data. This approach has gained traction and a large user base for data science practitioners - resulting in a new wave of systems that implement a dataframe API but allow for performance, efficiency, and distributed/parallel extensions to systems such as R and pandas. However, unlike relational databases and NoSQL systems with a variety of benchmarking, testing, and workload generation suites, there is an acute lack of similar tools for dataframe-based systems. This paper presents fuzzydata, a first step in providing an extensible workflow generation system that targets dataframe-based APIs. We present an abstract data processing workflow model, random table and workflow generators, and three clients implemented using our model. Using fuzzydata, we can encode a real-world workflow or randomly generate workflows using various parameters. These workflows can be scaled and replayed on multiple systems to provide stress testing, performance evaluation, and a breakdown of performance bottlenecks present on popular dataframe systems.

Topics Of Interest

Testing of database systems, storage services, and database applications
Testing of database systems using novel hardware and software technology (non-volatile memory, hardware transactional memory, …)
Testing heterogeneous systems with hardware accelerators (GPUs, FPGAs, ASICs, …)
Testing distributed and big data systems
Testing machine learning systems
Specific challenges of testing and quality assurance for cloud-based systems
War stories and lessons learned
Performance and scalability testing
Testing the reliability and availability of database systems
Algorithms and techniques for automatic program verification
Maximizing code coverage during testing of database systems and applications
Generation of synthetic data for test databases
Testing the effectiveness of adaptive policies and components
Tools for analyzing database management systems (e.g., profilers, debuggers)
Workload characterization with respect to performance metrics and engine components
Metrics for test quality, robustness, efficiency, and effectiveness
Operational aspects such as continuous integration and delivery pipelines
Security and vulnerability testing
Experimental reproduction of benchmark results
Functional and performance testing of interactive data exploration systems
Tracability, reproducibility and reasoning for ML-based systems

Details

Paper Submission

Authors are invited to submit original, unpublished research papers that are not being considered for publication in any other forum.

Submission Guideline

Expected 4 to 6 pages excluding references and appendix
Follow most recent ACM Proceedings Format
Submission will be handled through HotCRP

HotCRP

Timeline

March 11, 2022 / 11:59PM US PST

Paper Submission

April 4, 2022 / 11:59PM US PST

Notification Of Outcome

April 17, 2022 / 11:59PM US PST

Camera-Ready Copy

Schedule

Program Schedule and Recordings

DBTest will be held as a hybrid workshop this year. While there will be a physical presence, we will also use the Zoom video conferencing platform to stream the presentations, and have an interactive discussion about the papers and topics presented at the workshop.

The program for this year features two keynotes, three full papers, one short presentation, and a panel discussion, and is structured as follows (all times are EST):

Start Time (EST)	Title	Presenter	Mode	Recording	Slides
9:30 AM	Welcome	Manuel Rigger and Pınar Tözün	Remote	YouTube
9:45 AM	Journey of Migrating Millions of Queries on The Cloud	T. Saito, N. Takezoe, Y. Okada, T. Shimamoto, D. Yu, S. Chandrashekharachar, K. Sasaki, S. Okumiya, Y. Wang, T. Kurihara, R. Kobayashi, K. Suzuki, Z. Yang, and M. Onizuka	Remote	YouTube	Slides
10:15 AM	Benchbot: Benchmark as a Service for TiDB	Yuying Song and Huansheng Chen	Remote	YouTube	Slides
10:30 AM	Break
11:00 AM	Keynote 1: DuckDB Testing - Present and Future	Mark Raasveldt	Remote	YouTube	Slides
12:00 AM	DeepBench – Benchmarking JSON Document Stores	Stefano Belloni, Daniel Ritter, Marco Schröder, Nils Rörup	Remote	YouTube	Slides
12:30 AM	Lunch
2:00 PM	Keynote 2: Tackling performance and correctness problems in database-backed web applications	Shan Lu	Remote	YouTube	Slides
3:00 PM	FuzzyData: A Scalable Workload Generator for Testing Dataframe Workflow Systems	Mohammed Suhail Rehman, Aaron Elmore	In-person	YouTube	Slides
3:30 PM	Break (Poster Session)
4:30 PM	Panel Discussion	Greg Law, Allison Lee, Abdul Quamar, Yingjun Wu	Hybrid	YouTube
5:25 PM	Closing	Manuel Rigger and Pınar Tözün	Remote

Organization

Program Committee

Anisoara Nica (SAP SE)
Anja Grünheid (Microsoft)
Chee-Yong Chan (National University of Singapore)
Danica Porobic (Oracle)
Daniel Ritter (HPI)
Jayant R. Haritsa (Indian Institute of Science)
Joy Aruljay (Georgia Tech)
Junwen Yang (University of Chicago)
Muhammad Ali Gulzar (Virginia Tech)
Numair Mansur (MPI-SWS)
Renata Borovica-Gajic (University of Melbourne)
Shuai Wang (HKUST)
S. Sudarshan (IIT Bombay)
Stefania Dumbrava (ENSIIE)
Utku Sirin (Harvard University)

Welcome to DBTest

9th International Workshop on Testing Database Systems 2022

June 17, 2022 in conjunction with

About DBTest

Papers

DBTest '22: Proceedings of the 2022 workshop on 9th International Workshop of Testing Database Systems

DeepBench: Benchmarking JSON Document Stores

Journey of Migrating Millions of Queries on The Cloud

FuzzyData: A Scalable Workload Generator for Testing Dataframe Workflow Systems

Paper Submission

Submission Guideline

Timeline

Paper Submission

Notification Of Outcome

Camera-Ready Copy

Program Schedule and Recordings

Program Committee

Workshop Co-Chairs

Manuel
Rigger

Pinar
Tözün

Steering Comitee

Carsten
Binnig

Alexander
Böhm

Tilmann
Rabl

Welcome to DBTest

9th International Workshop on Testing Database Systems 2022

June 17, 2022 in conjunction with

About DBTest

Papers

DBTest '22: Proceedings of the 2022 workshop on 9th International Workshop of Testing Database Systems

Topics Of Interest

Paper Submission

Submission Guideline

Timeline

Paper Submission

Notification Of Outcome

Camera-Ready Copy

Program Schedule and Recordings

Program Committee

Workshop Co-Chairs

Steering Comitee

Our Sponsors