DBTest '24: Proceedings of the Tenth International Workshop on Testing Database Systems

Full Citation in the ACM Digital Library

Surprise Benchmarking: The Why, What, and How

Lawrence Benson
Carsten Binnig
Jan-Micha Bodensohn
Federico Lorenzi
Jigao Luo
Danica Porobic
Tilmann Rabl
Anupam Sanghi
Russell Sears
Pinar Tözün
Tobias Ziegler

Standardized benchmarks are crucial to ensure a fair comparison of performance across systems. While extremely valuable, these benchmarks all use a setup where the workload is well-defined and known in advance. Unfortunately, this has led to overly-tuning data management systems for particular benchmark workloads such as TPC-H or TPC-C. As a result, benchmarking results frequently do not reflect the behavior of these systems in many real-world settings since workloads often significantly vary from the "known" benchmarking workloads. To address this issue, we present surprise benchmarking, a complementary approach to the current standardized benchmarking where "unknown" queries are exercised during the evaluation.

KVBench: A Key-Value Benchmarking Suite

Zichen Zhu
Arpita Saha
Manos Athanassoulis
Subhadeep Sarkar

Key-value stores are at the core of several modern NoSQL-based data systems, and thus, a comprehensive benchmark tool is of paramount importance in evaluating their performance under different workloads. Prior research reveals that real-world workloads have a diverse range of characteristics, such as the fraction of point queries that target non-existing keys, point and range deletes, as well as, different distributions for queries and updates, all of which have very different performance implications. State-of-the-art key-value workload generators, such as YCSB and db_bench, fail to generate workloads that emulate these practical workloads, limiting the dimensions on which we can benchmark the systems' performance.

In this paper, we present KVBench, a novel synthetic workload generator that fills the gap between classical key-value workload generators and more complex real-life workloads. KVBench supports a wide range of operations, including point queries, range queries, inserts, updates, deletes, range deletes, and among these options, inserts, queries, and updates can be customized by different distributions. Compared to state-of-the-art key-value workload generators, KVBench offers a richer array of knobs, including the proportion of empty point queries, customized distributions for updates and queries, and range deletes with specific selectivity, constituting a significantly flexible framework that can better emulate real-world workloads.

Benchmarking Learned and LSM Indexes for Data Sortedness

Aneesh Raman
Andy Huynh
Jinqi Lu
Manos Athanassoulis

Database systems use indexes on frequently accessed attributes to accelerate query and transaction processing. This requires paying the cost of maintaining and updating those indexes, which can be thought of as the process of adding structure (e.g., sort) to an otherwise unstructured data collection. The indexing cost is redundant when data arrives pre-sorted, even if only up to some degree. While recent work has studied how classical indexes like B+-trees cannot fully exploit the near-sortedness during ingestion, there is a lack of this exploration on other index designs like read-optimized learned indexes or write-optimized LSM-trees.

In this paper, we bridge this gap by conducting the first-ever study on the behavior of learned indexes and LSM-trees when varying the data sortedness in an ingestion workload. Specifically, we build on prior work on benchmarking data sortedness on B+-trees and we expand the scope to benchmark: (i) ALEX and LIPP, which are updatable learned index designs; and (ii) the LSM-tree engine offered by RocksDB. We present our evaluation framework and detail key insights on the performance of the index designs when varying data sortedness. Our observations indicate that learned indexes exhibit unpredictable performance when ingesting differently sorted data, while LSM-trees can benefit from sortedness-aware optimizations. We highlight the potential headroom for improvement and lay the groundwork for further research in this domain.

Anatomy of the LSM Memory Buffer: Insights & Implications

Shubham Kaushik
Subhadeep Sarkar

Log-structured merge (LSM) tree is an ingestion-optimized data structure that is widely used in modern NoSQL key-value stores. To support high throughput for writes, LSM-trees maintain an in-memory buffer that absorbs the incoming entries before writing them to slower secondary storage. We point out that the choice of the data structure and implementation of the memory buffer has a significant impact on the overall performance of LSM-based storage engines. In fact, even with the same implementation of the buffer, the performance of a storage engine can vary by up to several orders of magnitude if there is a shift in the input workload.

In this paper, we benchmark the performance of LSM-based storage engines with different memory buffer implementations and under different workload characteristics. We experiment with four buffer implementations, namely, (i) vector, (ii) skip-list, (iii) hash skip-list, and (iv) hash linked-list, and for each implementation, we vary any design choices (such as bucket count in a hash skip-list and prefix length in a hash linked-list). We present a comprehensive performance benchmark for each buffer configuration, and highlight how the relative performance of the different buffer implementations varies with a shift in input workload. Lastly, we present a guideline for selecting the appropriate buffer implementation for a given workload and performance goal.

Performance Truthfulness of Differential Privacy for DB Testing

Eric Enzler
Olivier Goerens
Gustavo Alonso
Anja Gruenheid

Performance benchmarking is a crucial tool to evaluate whether performance degradation occurs when a system is modified, allowing cloud providers to plan the provisioning of and changes to their resources. This is most commonly realized through the execution of standardized industry benchmarks. However, these benchmarks oftentimes do not capture all facets of actual customer workloads. To mitigate this issue, one can use an anonymized version of the customer's workload through algorithms such as hashing, k-anonymity, or differential privacy, allowing for targeted and customized performance benchmarking. In this paper, we focus on one of these techniques, namely differential privacy, and examine its impact on benchmarking performance, i.e., whether differential privacy can maintain the same performance characteristics as the original workload. We discuss several challenges specific to differential privacy algorithms such as the handling of unique column values, queries that span multiple tables, and continuous values, as well as how effectively it can be deployed. Our examination shows that differential privacy is a promising technique in this space but has practical limitations such as scaling problems and, for some queries, a distortion of the performance characteristics.

Analysis of Geospatial Data Loading

Aske Wachs
Eleni Tzirita Zacharatou

The rate at which applications gather geospatial data today has turned data loading into a critical component of data analysis pipelines. However, users are confronted with multiple file formats for storing geospatial data and an array of systems for processing it. To shed light on how the choice of file format and system affects performance, this paper explores the performance of loading geospatial data stored in diverse file formats using different libraries. It aims to study the impact of different file formats, compare loading throughput across spatial libraries, and examine the microarchitectural behavior of geospatial data loading. Our findings show that GeoParquet files provide the highest loading throughput across all benchmarked libraries. Furthermore, we note that the more spatial features per byte a file format can store, the higher the data loading throughput. Our micro-architectural analysis reveals high instructions per cycle (IPC) during spatial data loading for most libraries and formats. Additionally, our experiments show that instruction misses dominate L1 cache misses, except for GeoParquet files, where data misses take over.

Topics Of Interest

Reproducibility of database research (new!)
Testing and benchmarking of learning-based database systems (new!)
Testing of database systems, storage services, and database applications
Testing of database systems using novel hardware and software technology (non-volatile memory, hardware transactional memory, …)
Testing heterogeneous systems with hardware accelerators (GPUs, FPGAs, ASICs, …)
Testing distributed and big data systems
Testing machine learning systems
Specific challenges of testing and quality assurance for cloud-based systems
War stories and lessons learned
Performance and scalability testing
Testing the reliability and availability of database systems
Algorithms and techniques for automatic program verification
Maximizing code coverage during testing of database systems and applications
Generation of synthetic data for test databases
Testing the effectiveness of adaptive policies and components
Tools for analyzing database management systems (e.g., profilers, debuggers)
Workload characterization with respect to performance metrics and engine components
Metrics for test quality, robustness, efficiency, and effectiveness
Operational aspects such as continuous integration and delivery pipelines
Security and vulnerability testing
Experimental reproduction of benchmark results
Functional and performance testing of interactive data exploration systems
Tracability, reproducibility and reasoning for ML-based systems

Keynote Speakers

Everett (Ev) Maus
Staff Software Engineer, Google

Ev has a passion for teaching computers to find bugs – ideally without making other developers too unhappy.

He's been on the Spanner Engineering Productivity team at Google in 2018, where he’s building tooling to sustainably ensure the correctness and reliability of Spanner. He was an invited industry attendee at the 2023 Daghstul Seminar on Ensuring the Reliability and Robustness of Database Management Systems.

Prior to Google, he worked at Microsoft from 2014 to 2018 on scaling static analysis and security tooling. While there, he contributed to the SARIF standard for tooling results and spoke at BlueHat about building security tooling.

When he isn’t trying to teach computers to break software (usually other people’s), he collects fountain pens, mixes a decent manhattan, plays violin, and recently adopted a puppy.

He graduated from the University of Virginia with a BA in Computer Science and Mathematics in December 2013.

Jesús Camacho Rodríguez
Principal Research SDE Manager, Gray Systems Lab (GSL)

Jesús is a Principal Research SDE Manager at the Gray Systems Lab (GSL), the applied research group within Azure Data. His research focuses broadly on optimizing data systems performance and efficiency, with close collaboration with product and engineering teams.

Before joining Microsoft, Jesús held various engineering positions at Cloudera, where he worked on query processing and optimization in Cloudera’s SQL data warehouse engines. He has also been actively involved with open-source projects such as Apache Calcite and Apache Hive.

Jesús earned his PhD from the University of Paris-Sud and Inria, France, and holds a degree in Computer Science and Engineering from the University of Almería, Spain.

Call for Contributions

Research or Experience Papers

Authors are invited to submit original, unpublished research papers, or experience papers that describe the implementation of testing-related solutions in applications and products that are not being considered for publication in any other forum pertaining to any of the topics listed above as DBTest's topics of interest.

The authors' submission is expected to contain 4 to 6 pages excluding references and appendix.

Accepted research submissions will be published in the ACM DL.

Talks

Authors can submit a talk proposal for previously published but relevant content as well as new ideas within the scope of DBTest's topics of interest that they want feedback from the community on.

The submission should consist of 1 page including references and appendix. Talk proposals need to be marked with the suffix [Talk Proposal] in the submission's title.

Accepted talk proposals will be listed on the DBTest homepage.

Guidelines

Authors are required to follow the current ACM Proceedings Format.

The submission will be handled through HotCRP.

HotCRP

Timeline

March 15**, 2024 / 11:59PM US PST

Paper Submission

April 11, 2024 / 11:59PM US PST

Notification Of Outcome

May 6, 2024 / 11:59PM US PST

Camera-Ready Copy

** Extended deadline from March 8th 2024.

Schedule

Program Schedule

Start Time	Title	Presenter
8:30 AM	Keynote 1: Transactions on Exabytes: What we've learned while testing Spanner	Everett (Ev) Maus
10:00 AM	Break
10:30 AM	Surprise Benchmarking: The Why, What, and How	Lawrence Benson, Carsten Binnig, Jan-Micha Bodensohn, Federico Lorenzi, Jigao Luo, Danica Porobic, Tilmann Rabl, Anupam Sanghi, Russell Sears, Pinar Tözün, Tobias Ziegler
11:00 AM	KVBench: A Key-Value Benchmarking Suite	Zichen Zhu, Arpita Saha, Manos Athanassoulis, Subhadeep Sarkar
11:30 AM	Benchmarking Learned and LSM Indexes for Data Sortedness	Aneesh Raman, Andy Huynh, Jinqi Lu, Manos Athanassoulis
12:00 PM	Lunch
2:00 PM	Keynote 2: From Paper to Practice: Leveraging LST-Bench to Evaluate Lake-Centric Data Platforms	Jesús Camacho Rodríguez
3:30 PM	Break
4:00 PM	Anatomy of the LSM Memory Buffer: Insights & Implications	Shubham Kaushik, Subhadeep Sarkar
4:30 PM	Performance Truthfulness of Differential Privacy for DB Testing	Eric Enzler, Olivier Goerens, Gustavo Alonso, Anja Gruenheid
5:00 PM	Utilizing Query Plans for Testing	Jinsheng Ba
5:30 PM	Analysis of Geospatial Data Loading	Aske Wachs, Eleni Tzirita Zacharatou

Organization

Program Committee

Adam Dickinson (Snowflake)
Amarnadh Sai Eluri (Google)
Anupam Sanghi (IBM Research)
Brian Kroth (Microsoft GSL)
Danica Porobic (Oracle)
Daniel Ritter (SAP)
Jinsheng Ba (NUS)
Renata Borovica-Gajic (University of Melbourne)
Russell Sears (CrystalDB)
Stefania Dumbrava (ENSIIE)
Wensheng Dou (University of Chinese Academy of Sciences)
Xiu Tang (Zhejiang University)
Yannis Chronis (Google Research)
Zuming Jiang (ETH)

Welcome to DBTest

10th International Workshop on Testing Database Systems 2024

June 9, 2024 in conjunction with

About DBTest

DBTest '24: Proceedings of the Tenth International Workshop on Testing Database Systems

Surprise Benchmarking: The Why, What, and How

KVBench: A Key-Value Benchmarking Suite

Benchmarking Learned and LSM Indexes for Data Sortedness

Anatomy of the LSM Memory Buffer: Insights & Implications

Performance Truthfulness of Differential Privacy for DB Testing

Analysis of Geospatial Data Loading

Everett (Ev) Maus
Staff Software Engineer, Google

Jesús Camacho Rodríguez
Principal Research SDE Manager, Gray Systems Lab (GSL)

Research or Experience Papers

Talks

Guidelines

Timeline

Paper Submission

Notification Of Outcome

Camera-Ready Copy

Program Schedule

Program Committee

Workshop Co-Chairs

Anja
Gruenheid

Manuel
Rigger

Steering Committee

Carsten
Binnig

Alexander
Böhm

Tilmann
Rabl

Welcome to DBTest

10th International Workshop on Testing Database Systems 2024

June 9, 2024 in conjunction with

About DBTest

DBTest '24: Proceedings of the Tenth International Workshop on Testing Database Systems

Topics Of Interest

Everett (Ev) Maus Staff Software Engineer, Google

Jesús Camacho Rodríguez Principal Research SDE Manager, Gray Systems Lab (GSL)

Research or Experience Papers

Talks

Guidelines

Timeline

Paper Submission

Notification Of Outcome

Camera-Ready Copy

Program Schedule

Program Committee

Workshop Co-Chairs

Steering Committee

Everett (Ev) Maus
Staff Software Engineer, Google

Jesús Camacho Rodríguez
Principal Research SDE Manager, Gray Systems Lab (GSL)