Database Query Optimization

The query optimizer is in charge of determining the optimal execution plan for a given database query. The subclass of cost-based query optimizers computes the optimal plan by estimating the cardinality of all the relational operators in the query and then selecting the plan that minimizes a cost function. As such, the cardinality of selection predicates, joins, and all the other relational operators has to be estimated. When available, this is done with precomputed synopses, e.g., histograms, samples, sketches, stored in the metadata catalog. Otherwise, an arbitrary guess is used, e.g., for user-defined functions or set operators. Synopses are typically built for a single attribute and assume uniformity and/or independence when they are combined across multiple attributes. These are likely to miss correlations between attributes over skewed or sparse data and result in in-accurate estimates which produce highly sub-optimal query execution plans. Thus, despite the large volume of work on the topic, selectivity estimation is still an open problem.

Our initial work on query optimization has focused on designing accurate and efficient synopses for cardinality estimation. The entire PhD dissertation (2009) is dedicated to creating advanced sketches for join size estimation. This work and its corresponding implementation have been used in the Oracle database query optimizer. In a subsequent line of work, we have focused on antijoin cardinality estimation (2015), which is a considerably more complicated problem than the already hard join size estimation. More recently, we have targeted query optimization for modern in-memory databases (2019). We have shown that predicate selectivity can be computed exactly with minimal overhead in the GPU-based MapD database, resulting in better query plans that are executed faster.

UC Merced | EECS | Home

Last updated: Friday, July 26, 2019