Sketches for Size of Join Estimation
This package contains the implementation in C++ of the sketching algorithms for size of join estimation. This code has been used in the Oracle database query optimizer. The content of the package can be divided into the following main categories:
-
Schemes for generating pseudo-random numbers
+/-1 random variables and b-valued hashes are at the core of all the sketching techniques. The package contains efficient implementations of the generating schemes based on BCH codes, Reed-Muller codes, and Carter-Wegman hashes. Implementations for fast range-summation and dyadic mappings of the +/-1 random variables are also provided.
For details see the papers:
-
Fast Range-Summable Random Variables for Efficient Aggregate Estimation
by F. Rusu and A. Dobra
-
Pseudo-Random Number Generation for Sketch-Based Estimations
by F. Rusu and A. Dobra
Files:
-
Sketches for size of join estimation
The package contains the implementation of the known sketches for size of join and second frequency moment estimation. It includes the original AGMS sketches and the improved versions Fast-AGMS sketches, Fast-Count sketches, and Count-Min sketches.
For details consult the papers:
-
Statistical Analysis of Sketch Estimators
by F. Rusu and A. Dobra
-
Sketches for Size of Join Estimation
by F. Rusu and A. Dobra
Files:
-
Sketches over sampled data
The update performance of sketches can be significantly improved if only a sample of the data is sketched, without significant degradation in the accuracy. Simple sampling algorithms like Bernoulli sampling, sampling without replacement, and sampling with replacement, are also included in the package.
For details see the paper:
-
Sketching Sampled Data Streams
by F. Rusu and A. Dobra
Files:
-
Synthetic data generator
The package contains a simple data generator for different distributions with customizable parameters. This allows the testing of the implemented algorithms.
Files:
-
Examples
Two usage examples are included in the package. The first example generates size of join estimations with all the implemented sketching techniques. The second example is for sketching Bernoulli samples.
Files:
Requirements
In order to compile the package, the GSL development library has to be installed.
Download
The package can be found here.
UC Merced |
EECS |
Home
Last updated: Wednesday, December 13, 2017