Welcome to DBTest

Welcome

About DBTest

Over the last few years, we have seen an increase in academic and industry work focusing on benchmarking and performance evaluation in novel cloud settings, driven both by an increase and diversity in workloads as well as hardware such as FPGAs or GPUs. Moreover, there are new classes of data-driven applications (i.e., machine learning and big data scenarios) that now need to be considered while at the same time, SQL and no-SQL engines alike are developing, and new engine concepts such as unified engines are being created. Consequentially, special testing efforts and rigor to ensure classical database strengths such as reliability, integrity, and performance are crucial for these novel system architectures and designs.

The goal of DBTest 2024 is to bring researchers and practitioners from academia and industry together to discuss challenges, approaches and open problems around these issues.

DBTest '24: Proceedings of the Tenth International Workshop on Testing Database Systems

DBTest '24: Proceedings of the Tenth International Workshop on Testing Database Systems

Full Citation in the ACM Digital Library

Surprise Benchmarking: The Why, What, and How

  • Lawrence Benson
  • Carsten Binnig
  • Jan-Micha Bodensohn
  • Federico Lorenzi
  • Jigao Luo
  • Danica Porobic
  • Tilmann Rabl
  • Anupam Sanghi
  • Russell Sears
  • Pinar Tözün
  • Tobias Ziegler

Standardized benchmarks are crucial to ensure a fair comparison of performance across systems. While extremely valuable, these benchmarks all use a setup where the workload is well-defined and known in advance. Unfortunately, this has led to overly-tuning data management systems for particular benchmark workloads such as TPC-H or TPC-C. As a result, benchmarking results frequently do not reflect the behavior of these systems in many real-world settings since workloads often significantly vary from the "known" benchmarking workloads. To address this issue, we present surprise benchmarking, a complementary approach to the current standardized benchmarking where "unknown" queries are exercised during the evaluation.

KVBench: A Key-Value Benchmarking Suite

  • Zichen Zhu
  • Arpita Saha
  • Manos Athanassoulis
  • Subhadeep Sarkar

Key-value stores are at the core of several modern NoSQL-based data systems, and thus, a comprehensive benchmark tool is of paramount importance in evaluating their performance under different workloads. Prior research reveals that real-world workloads have a diverse range of characteristics, such as the fraction of point queries that target non-existing keys, point and range deletes, as well as, different distributions for queries and updates, all of which have very different performance implications. State-of-the-art key-value workload generators, such as YCSB and db_bench, fail to generate workloads that emulate these practical workloads, limiting the dimensions on which we can benchmark the systems' performance.

In this paper, we present KVBench, a novel synthetic workload generator that fills the gap between classical key-value workload generators and more complex real-life workloads. KVBench supports a wide range of operations, including point queries, range queries, inserts, updates, deletes, range deletes, and among these options, inserts, queries, and updates can be customized by different distributions. Compared to state-of-the-art key-value workload generators, KVBench offers a richer array of knobs, including the proportion of empty point queries, customized distributions for updates and queries, and range deletes with specific selectivity, constituting a significantly flexible framework that can better emulate real-world workloads.

Benchmarking Learned and LSM Indexes for Data Sortedness

  • Aneesh Raman
  • Andy Huynh
  • Jinqi Lu
  • Manos Athanassoulis

Database systems use indexes on frequently accessed attributes to accelerate query and transaction processing. This requires paying the cost of maintaining and updating those indexes, which can be thought of as the process of adding structure (e.g., sort) to an otherwise unstructured data collection. The indexing cost is redundant when data arrives pre-sorted, even if only up to some degree. While recent work has studied how classical indexes like B+-trees cannot fully exploit the near-sortedness during ingestion, there is a lack of this exploration on other index designs like read-optimized learned indexes or write-optimized LSM-trees.

In this paper, we bridge this gap by conducting the first-ever study on the behavior of learned indexes and LSM-trees when varying the data sortedness in an ingestion workload. Specifically, we build on prior work on benchmarking data sortedness on B+-trees and we expand the scope to benchmark: (i) ALEX and LIPP, which are updatable learned index designs; and (ii) the LSM-tree engine offered by RocksDB. We present our evaluation framework and detail key insights on the performance of the index designs when varying data sortedness. Our observations indicate that learned indexes exhibit unpredictable performance when ingesting differently sorted data, while LSM-trees can benefit from sortedness-aware optimizations. We highlight the potential headroom for improvement and lay the groundwork for further research in this domain.

Anatomy of the LSM Memory Buffer: Insights & Implications

  • Shubham Kaushik
  • Subhadeep Sarkar

Log-structured merge (LSM) tree is an ingestion-optimized data structure that is widely used in modern NoSQL key-value stores. To support high throughput for writes, LSM-trees maintain an in-memory buffer that absorbs the incoming entries before writing them to slower secondary storage. We point out that the choice of the data structure and implementation of the memory buffer has a significant impact on the overall performance of LSM-based storage engines. In fact, even with the same implementation of the buffer, the performance of a storage engine can vary by up to several orders of magnitude if there is a shift in the input workload.

In this paper, we benchmark the performance of LSM-based storage engines with different memory buffer implementations and under different workload characteristics. We experiment with four buffer implementations, namely, (i) vector, (ii) skip-list, (iii) hash skip-list, and (iv) hash linked-list, and for each implementation, we vary any design choices (such as bucket count in a hash skip-list and prefix length in a hash linked-list). We present a comprehensive performance benchmark for each buffer configuration, and highlight how the relative performance of the different buffer implementations varies with a shift in input workload. Lastly, we present a guideline for selecting the appropriate buffer implementation for a given workload and performance goal.

Performance Truthfulness of Differential Privacy for DB Testing

  • Eric Enzler
  • Olivier Goerens
  • Gustavo Alonso
  • Anja Gruenheid

Performance benchmarking is a crucial tool to evaluate whether performance degradation occurs when a system is modified, allowing cloud providers to plan the provisioning of and changes to their resources. This is most commonly realized through the execution of standardized industry benchmarks. However, these benchmarks oftentimes do not capture all facets of actual customer workloads. To mitigate this issue, one can use an anonymized version of the customer's workload through algorithms such as hashing, k-anonymity, or differential privacy, allowing for targeted and customized performance benchmarking. In this paper, we focus on one of these techniques, namely differential privacy, and examine its impact on benchmarking performance, i.e., whether differential privacy can maintain the same performance characteristics as the original workload. We discuss several challenges specific to differential privacy algorithms such as the handling of unique column values, queries that span multiple tables, and continuous values, as well as how effectively it can be deployed. Our examination shows that differential privacy is a promising technique in this space but has practical limitations such as scaling problems and, for some queries, a distortion of the performance characteristics.

Analysis of Geospatial Data Loading

  • Aske Wachs
  • Eleni Tzirita Zacharatou

The rate at which applications gather geospatial data today has turned data loading into a critical component of data analysis pipelines. However, users are confronted with multiple file formats for storing geospatial data and an array of systems for processing it. To shed light on how the choice of file format and system affects performance, this paper explores the performance of loading geospatial data stored in diverse file formats using different libraries. It aims to study the impact of different file formats, compare loading throughput across spatial libraries, and examine the microarchitectural behavior of geospatial data loading. Our findings show that GeoParquet files provide the highest loading throughput across all benchmarked libraries. Furthermore, we note that the more spatial features per byte a file format can store, the higher the data loading throughput. Our micro-architectural analysis reveals high instructions per cycle (IPC) during spatial data loading for most libraries and formats. Additionally, our experiments show that instruction misses dominate L1 cache misses, except for GeoParquet files, where data misses take over.

Topics Of Interest

  • Reproducibility of database research (new!)
  • Testing and benchmarking of learning-based database systems (new!)
  • Testing of database systems, storage services, and database applications
  • Testing of database systems using novel hardware and software technology (non-volatile memory, hardware transactional memory, …)
  • Testing heterogeneous systems with hardware accelerators (GPUs, FPGAs, ASICs, …)
  • Testing distributed and big data systems
  • Testing machine learning systems
  • Specific challenges of testing and quality assurance for cloud-based systems
  • War stories and lessons learned
  • Performance and scalability testing
  • Testing the reliability and availability of database systems
  • Algorithms and techniques for automatic program verification
  • Maximizing code coverage during testing of database systems and applications
  • Generation of synthetic data for test databases
  • Testing the effectiveness of adaptive policies and components
  • Tools for analyzing database management systems (e.g., profilers, debuggers)
  • Workload characterization with respect to performance metrics and engine components
  • Metrics for test quality, robustness, efficiency, and effectiveness
  • Operational aspects such as continuous integration and delivery pipelines
  • Security and vulnerability testing
  • Experimental reproduction of benchmark results
  • Functional and performance testing of interactive data exploration systems
  • Tracability, reproducibility and reasoning for ML-based systems

Keynote Speakers

Everett (Ev) Maus
Staff Software Engineer, Google

Ev has a passion for teaching computers to find bugs – ideally without making other developers too unhappy.

He's been on the Spanner Engineering Productivity team at Google in 2018, where he’s building tooling to sustainably ensure the correctness and reliability of Spanner. He was an invited industry attendee at the 2023 Daghstul Seminar on Ensuring the Reliability and Robustness of Database Management Systems.

Prior to Google, he worked at Microsoft from 2014 to 2018 on scaling static analysis and security tooling. While there, he contributed to the SARIF standard for tooling results and spoke at BlueHat about building security tooling.

When he isn’t trying to teach computers to break software (usually other people’s), he collects fountain pens, mixes a decent manhattan, plays violin, and recently adopted a puppy.

He graduated from the University of Virginia with a BA in Computer Science and Mathematics in December 2013.

Jesús Camacho Rodríguez
Principal Research SDE Manager, Gray Systems Lab (GSL)

Jesús is a Principal Research SDE Manager at the Gray Systems Lab (GSL), the applied research group within Azure Data. His research focuses broadly on optimizing data systems performance and efficiency, with close collaboration with product and engineering teams.

Before joining Microsoft, Jesús held various engineering positions at Cloudera, where he worked on query processing and optimization in Cloudera’s SQL data warehouse engines. He has also been actively involved with open-source projects such as Apache Calcite and Apache Hive.

Jesús earned his PhD from the University of Paris-Sud and Inria, France, and holds a degree in Computer Science and Engineering from the University of Almería, Spain.



Call for Contributions

Research or Experience Papers

Authors are invited to submit original, unpublished research papers, or experience papers that describe the implementation of testing-related solutions in applications and products that are not being considered for publication in any other forum pertaining to any of the topics listed above as DBTest's topics of interest.

The authors' submission is expected to contain 4 to 6 pages excluding references and appendix.

Accepted research submissions will be published in the ACM DL.

Talks

Authors can submit a talk proposal for previously published but relevant content as well as new ideas within the scope of DBTest's topics of interest that they want feedback from the community on.

The submission should consist of 1 page including references and appendix. Talk proposals need to be marked with the suffix [Talk Proposal] in the submission's title.

Accepted talk proposals will be listed on the DBTest homepage.



Guidelines

Authors are required to follow the current ACM Proceedings Format.

The submission will be handled through HotCRP.


HotCRP

Timeline

March 15**, 2024 / 11:59PM US PST

Paper Submission

April 11, 2024 / 11:59PM US PST

Notification Of Outcome

May 6, 2024 / 11:59PM US PST

Camera-Ready Copy
** Extended deadline from March 8th 2024.
Schedule

Program Schedule

Start Time Title Presenter Recording Slides
8:30 AMKeynote 1: Transactions on Exabytes: What we've learned while testing SpannerEverett (Ev) Maus
10:00 AMBreak
10:30 AMSurprise Benchmarking: The Why, What, and HowLawrence Benson, Carsten Binnig, Jan-Micha Bodensohn, Federico Lorenzi, Jigao Luo, Danica Porobic, Tilmann Rabl, Anupam Sanghi, Russell Sears, Pinar Tözün, Tobias Ziegler
11:00 AMKVBench: A Key-Value Benchmarking SuiteZichen Zhu, Arpita Saha, Manos Athanassoulis, Subhadeep Sarkar
11:30 AMBenchmarking Learned and LSM Indexes for Data SortednessAneesh Raman, Andy Huynh, Jinqi Lu, Manos Athanassoulis
12:00 PMLunch
2:00 PMKeynote 2: From Paper to Practice: Leveraging LST-Bench to Evaluate Lake-Centric Data PlatformsJesús Camacho Rodríguez
3:30 PMBreak
4:00 PMAnatomy of the LSM Memory Buffer: Insights & ImplicationsShubham Kaushik, Subhadeep Sarkar
4:30 PMPerformance Truthfulness of Differential Privacy for DB TestingEric Enzler, Olivier Goerens, Gustavo Alonso, Anja Gruenheid
5:00 PMUtilizing Query Plans for TestingJinsheng Ba
5:30 PMAnalysis of Geospatial Data LoadingAske Wachs, Eleni Tzirita Zacharatou

Organization

Program Committee

Adam Dickinson (Snowflake)
Amarnadh Sai Eluri (Google)
Anupam Sanghi (IBM Research)
Brian Kroth (Microsoft GSL)
Danica Porobic (Oracle)
Daniel Ritter (SAP)
Jinsheng Ba (NUS)
Renata Borovica-Gajic (University of Melbourne)
Russell Sears (CrystalDB)
Stefania Dumbrava (ENSIIE)
Wensheng Dou (University of Chinese Academy of Sciences)
Xiu Tang (Zhejiang University)
Yannis Chronis (Google Research)
Zuming Jiang (ETH)

Workshop Co-Chairs

Anja
Gruenheid

Microsoft, Switzerland

Manuel
Rigger

National University of Singapore

Steering Committee

Carsten
Binnig

TU Darmstadt, Germany

Alexander
Böhm

Google, Germany

Tilmann
Rabl

TU Berlin, Germany