YCSB
YCSB (Yahoo!® Cloud Serving Benchmark) is an industry-standard benchmark for databases and storage engines supporting key-value workloads. The YCSB implementation is open source, and includes an extensible framework for integrating different data stores.
We integrated HSE with YCSB to make it easy to compare its performance
and scalability to that of other storage engines for YCSB workloads.
The hse-ycsb
repo is a fork of
YCSB that adds support for HSE.
In the sections that follow, the reader is assumed to be familiar with configuring and running the YCSB benchmark. The information provided here is specific to using YCSB with HSE.
Install HSE and Create a KVDB
Review the getting started section of this documentation. Then, following those instructions and examples, start by
- Installing HSE and mpool from packages or source
- Configuring an mpool for HSE storage with appropriate permissions
- Creating an HSE KVDB in that mpool to store the YCSB data
In the examples below, we assume the name of the mpool is ycsbData
,
which is also the name of the KVDB.
Install YCSB Dependencies
Install dependencies for your platform.
$ sudo dnf install maven rpm-build
$ sudo apt-get install maven
$ sudo yum install rh-maven36 java-1.8.0-openjdk java-1.8.0-openjdk-devel rpm-build
Install YCSB with HSE from Packages
YCSB with HSE can be installed from release packages.
Download and install the latest hse-ycsb
package for your platform.
Package names start with hse-ycsb*A.B.C.D.E-X.Y.Z
, where
A.B.C
is the YCSB version (e.g.,0.17.0
)D.E
is our YCSB integration versionX.Y.Z
is the minimum HSE release version required
Tip
An example of a specific prefix is hse-ycsb-0.17.0.2.1-1.8.0.
Install the package for your platform as follows.
$ sudo dnf install ./hse-ycsb-A.B.C.D.E-X.Y.Z*.rpm
$ sudo apt-get install ./hse-ycsb_A.B.C.D.E-X.Y.Z*.deb
$ sudo yum install ./hse-ycsb-A.B.C.D.E-X.Y.Z*.rpm
Install YCSB with HSE from Source
YCSB with HSE can also be built and installed from source.
Clone the latest release tag from the hse-ycsb
repo.
Releases are named rA.B.C.D.E-hse-X.Y.Z
where
A.B.C
is the YCSB version (e.g.,0.17.0
)D.E
is our YCSB integration versionX.Y.Z
is the minimum HSE release version required
Tip
An example of a specific release tag is r0.17.0.2.1-hse-1.8.0.
For example
$ git clone https://github.com/hse-project/hse-ycsb.git
$ cd hse-ycsb
$ git checkout rA.B.C.D.E-hse-X.Y.Z
Build and install YCSB for your platform as follows.
$ make package
Note
For r0.17.0.2.0-hse-1.7.1
instead use make rpm
Install the resulting package as described above for the RHEL 8 release package. You can locate the package built by executing
$ find /tmp/$(id -un)/ -name hse-ycsb*.rpm
$ make package
$ find /tmp/$(id -un)/ -name hse-ycsb*.deb
$ scl enable rh-maven36 "make rpm"
$ find /tmp/$(id -un)/ -name hse-ycsb*.rpm
New YCSB Options
The build of YCSB with HSE adds the following options.
hse.mpool_name
is the name of the mpool (KVDB) storing the YCSB datahse.params
is a string of comma-separated HSE parameters; e.g.,"kvdb.dur_intvl_ms=1000,kvdb.log_lvl=6"
hse.config_path
is the path to an HSE configuration file
HSE also installs configuration files
in /opt/hse/config
for use with YCSB.
There are separate configuration files for each of load and run phases.
Furthermore, YCSB Workload E has its own pair of HSE configuration files.
Run YCSB with HSE
The following executes YCSB Workload A with HSE
$ cd /opt/hse-ycsb/
$ sudo alternatives --set python /usr/bin/python2
$ ./bin/ycsb load hse -threads 96 -P workloads/workloada -p recordcount=200000 -p hse.mpool_name=ycsbData -p hse.config_path=/opt/hse/config/native_ycsb_abcdf_load.yml
$ ./bin/ycsb run hse -threads 96 -P workloads/workloada -p recordcount=200000 -p operationcount=200000 -p hse.mpool_name=ycsbData -p hse.config_path=/opt/hse/config/native_ycsb_abcdf_run.yml
The owner (UID) and group owner (GID) of the mpool that ycsb
will use
to store data should match the user account running ycsb
.
See the release notes for any limitations or known issues for a particular release of YCSB with HSE.
YCSB Performance Results
Below are results from running YCSB with HSE. For comparison, we include results from running YCSB with RocksDB, a popular and widely-deployed key-value store.
Server configuration:
- 2-socket Intel® Xeon® CPU E5-2690 v4
- 256GB DRAM
- 4x Micron® 9300 NVMe SSDs 3.2TB in an LVM striped logical volume
- RHEL 8.1
- HSE 1.7.0
- RocksDB 6.6.4
YCSB benchmark configuration:
- 2TB dataset consisting of 2-billion 1,000-byte records
- 96 client threads
- 2 billion operations per workload
- YCSB 0.17.0
For HSE, we specified the configuration files
/opt/hse/config/native_ycsb_abcdf_load.yml
and
/opt/hse/config/native_ycsb_abcdf_run.yml
for the load and run phases
of YCSB, respectively. These are installed with HSE.
The following table summarizes the YCSB workloads presented here. The application examples come from the YCSB documentation.
YCSB Workload | Operations | Application Example |
---|---|---|
A | 50% Read; 50% Update | Session store recording user-session activity |
B | 95% Read; 5% Update | Photo tagging |
C | 100% Read | User profile cache |
D | 95% Read; 5% Insert | User status updates |
Load Phase
YCSB starts by populating the dataset (database) to the size specified. This is a 100% Insert workload. Load phase statistics are presented in the following table. For this workload, HSE delivered more than 8x the throughput compared to RocksDB — reducing total duration by 88%.
Load Metric | HSE | RocksDB |
---|---|---|
Duration (minutes) | 34 | 271 |
Inserts / second | 986,009 | 122,899 |
Insert 99.9% latency (ms) | 1.6 | 19.5 |
Run Phase
Run phase throughputs for HSE and RocksDB are shown in the following chart. For these YCSB workloads, HSE delivered up to nearly 6x more throughput than RocksDB.
In delivering high throughput, HSE also demonstrated good 99.9% tail latency, as shown below. For these YCSB workloads, HSE reduced read tail latency up to 42%, and write (insert or update) tail latency by as much as 91%.
Finally, we measured the amount of data written to and read from the drives in the course of executing each workload. Reducing writes is important for SSDs because it translates to increased endurance. Reducing both writes and reads is important for networked storage to reduce load on the fabric.
The following chart shows the total bytes of data written to or read from the drives during workload execution. For these YCSB workloads, HSE reduced bytes read up to 64%, and bytes written by as much as 85%.