Sketches for Size of Join Estimation
This package contains the implementation in C++ of the sketching algorithms for size of join estimation. This code has been used in the Oracle database query optimizer. The content of the package can be divided into the following main categories:

Schemes for generating pseudorandom numbers
+/1 random variables and bvalued hashes are at the core of all the sketching techniques. The package contains efficient implementations of the generating schemes based on BCH codes, ReedMuller codes, and CarterWegman hashes. Implementations for fast rangesummation and dyadic mappings of the +/1 random variables are also provided.
For details see the papers:

Fast RangeSummable Random Variables for Efficient Aggregate Estimation
by F. Rusu and A. Dobra

PseudoRandom Number Generation for SketchBased Estimations
by F. Rusu and A. Dobra
Files:

Sketches for size of join estimation
The package contains the implementation of the known sketches for size of join and second frequency moment estimation. It includes the original AGMS sketches and the improved versions FastAGMS sketches, FastCount sketches, and CountMin sketches.
For details consult the papers:

Statistical Analysis of Sketch Estimators
by F. Rusu and A. Dobra

Sketches for Size of Join Estimation
by F. Rusu and A. Dobra
Files:

Sketches over sampled data
The update performance of sketches can be significantly improved if only a sample of the data is sketched, without significant degradation in the accuracy. Simple sampling algorithms like Bernoulli sampling, sampling without replacement, and sampling with replacement, are also included in the package.
For details see the paper:

Sketching Sampled Data Streams
by F. Rusu and A. Dobra
Files:

Synthetic data generator
The package contains a simple data generator for different distributions with customizable parameters. This allows the testing of the implemented algorithms.
Files:

Examples
Two usage examples are included in the package. The first example generates size of join estimations with all the implemented sketching techniques. The second example is for sketching Bernoulli samples.
Files:
Requirements
In order to compile the package, the GSL development library has to be installed.
Download
The package can be found here.
UC Merced 
EECS 
Home
Last updated: Wednesday, December 13, 2017