论文:2018,Vol:36,Issue(4):768-777
引用本文:
高锦涛, 李战怀, 刘文洁. 一种高效准确的基于查询结果的基数估计策略[J]. 西北工业大学学报
Gao Jintao, Li Zhanhuai, Liu Wenjie. A Strategy of Efficient and Accurate Cardinality Estimation Based on Query Result[J]. Northwestern polytechnical university

一种高效准确的基于查询结果的基数估计策略
高锦涛, 李战怀, 刘文洁
西北工业大学 计算机学院, 陕西 西安 710072
摘要:
基数估计是查询优化的重要组成部分,其高效性、准确性直接影响查询优化效果。传统基数估计策略基于原表或原表样本进行统计信息收集,然后利用收集好的统计信息推导出基数。该策略在数据量大时,统计信息收集效率低;统计信息存在延迟,并且基数通过推导得到,准确度无法保证;一些策略通过子查询的反馈信息得到基数,但结果没有保存,基数获取效率低。为解决这些问题,提出了一种高效准确的基于查询结果的基数估计策略(cardinality estimation based on query result,CEQR),特点是统计信息来源为查询执行结果,不需要进行推导,保证基数的准确度,并且收集效率与原表数据量无关;建立一种基数表,保存基本表和中间结果在某种谓词下的统计信息,为后续查询提供服务,并建立基数维护规则,合理管理基数表;建立资源感知策略,将基数项映射到缓存,加快统计信息获取效率。给出了基于CEQR策略的适应性以及误差分析,并通过实验得出CEQR策略在效率上优于传统基数估计策略。
关键词:    大数据    基数估计    查询优化    查询结果    高效    准确   
A Strategy of Efficient and Accurate Cardinality Estimation Based on Query Result
Gao Jintao, Li Zhanhuai, Liu Wenjie
School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China
Abstract:
Cardinality estimation is an important component of query optimization. Its accuracy and efficiency directly decide effect of query optimization. Traditional cardinality estimation strategy is based on original table or sample to collect statistics, then inferring cardinality by collected statistics. It will be low-efficiency when handling big data; Statistics exist update latency and are gotten by inferring, which can not guarantee correctness; Some strategies can get the actual cardinality by executing some subqueries, but they do not keep the result, leading to low efficiency of fetching statistics. Against these problems, this paper proposes a novel cardinality estimation strategy, called cardinality estimation based on query result(CEQR). For keeping correctness of cardinality, CEQR directly gets statistics from query results, which is not related with data size; we build a cardinality table to store the statistics of basic tables and middle results under specific predicates. Cardinality table can provide cardinality services for subsequent queries, and we build a suit of rules to maintain cardinality table; To improve the efficiency of fetching statistics, we introduce the source aware strategy, which hashes cardinality item to appropriate cache. This paper gives the adaptability and deviation analytic of CEQR, and proves that CEQR is more efficient than traditional cardinality estimation strategy by experiments.
Key words:    big data    cardinality estimation    query optimization    query result    efficient    accurate   
收稿日期: 2017-05-08     修回日期:
DOI:
基金项目: 国家高技术研究发展计划(863)(2015AA015307)、陕西省基础研究计划(2017JM6104)与国家自然科学基金(61732014)资助
通讯作者:     Email:
作者简介: 高锦涛(1986-),西北工业大学博士研究生,主要从事数据库查询优化以及分布式研究。
相关功能
PDF(1914KB) Free
打印本文
把本文推荐给朋友
作者相关文章
高锦涛  在本刊中的所有文章
李战怀  在本刊中的所有文章
刘文洁  在本刊中的所有文章

参考文献:
[1] Ioannidid Y E, Christodoulakis S. On the Propagation of Errors in the Size of Join Results[J]. Acm Sigmod Record, 1991, 20(2):268-277
[2] Melvin C, Chen C N. Adaptive Selectivity Estimation Using Query Feedback[J]. Acm sigmode Recordf, 1994, 23(2):161-172
[3] Stillger M, Lohman G M, Markl V, et al. LEO-DB2's Learning Optimizer[C]//Proceedidngs of the 27th International Conference on Very Large Data Bases, 2001:19-28
[4] PostgreSQL. Postgresql 9.6[EB/OL].[2018-06-28]. https://www.postgresql.org/docs/9.6/static/monitoring.html
[5] Oracle. Oracle DataBase SQL Tuning Guide 12c Release 1[EB/OL].[2013-06-26] http://docs.oracle.com/database/121/index.htm
[6] Teradata. Teradata Performance Management[EB/OL].[2005-01-01] http://www.teradataforum.com/teradata_pdf/035-1097-115a.pdf
[7] Chakkappen S, Cruanes T, Dageville B, et al. Efficient and Scalable Statistics Gathering for Large Databases in Oracle 11g[C]//Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data ACM, 2008:1053-1064
[8] Canonne C L. Are Few Bins Enough:Testing Histogram Distributions[C]//Proceedins of the 35th ACM Sigmod-Sigact-Sigai Symposium on Principles of Database Systems ACM, 2016:455-463
[9] Soliman M A, Antova L, Raghavan V, et al. Orca:A Modular Query Optimizer Architecture for Big Data[C]//Proceedings of the ACM Sigmod International Conference on Management of Data ACM, 2014:337-348
[10] Graefe, Goetz. The Cascades Framework for Query Optimization[J]. Data Engineering Bulletin, 1995, 18(5):19-29
[11] Shankar S, Nehme R, Aguilar-Saborit J, et al. Query Optimization in Microsoft SQL Server PDW[C]//Proceedings of the 2012 ACM Sigmod International Conference on Management of Data ACM, 2012:767-776
[12] Elseidy M, Elguindy A, Vitorovic A, et al. Scalable and Adaptive Online Joins[J]. Proceedings of the VLDB Endowment, 2014, 7(6):441-452
[13] Chen J, Jindel S, Walzer R, et al. The MemSQL Query Optimizer:A Modern Optimizer for Real-Time Analytics in a Distributed Database[J]. Proceedings of the VLDB Endowment, 2016, 9(13):1401-1412
[14] Tian F, DeWitt D J. Tuple Routing Strategies for Distributed Eddies[C]//Proceedings of the 29th International Conference on Very Large Data Bases VLDB Endowment, 2003(29):333-344
[15] Zhou Y, Ooi B C, Tan K L. Dynamic Load Management for Distributed Continuous Query Systems[C]//Proceedings of the 21st International Conference on Data Engineering, 2005:322-323
[16] Yin S, Hameurlain A, Morvan F. Robust Query Optimization Methods with Respect to Estimation Errors:a Survey[J]. ACM Sigmod Record, 2015, 44(3):25-36
[17] Ioannidis Y. The History of Histograms[C]//Proceedings of the 29th International Conference on Very Large Data Bases, VLDB Endowment, 2003(29):19-30
[18] Chaudhuri Surajit. An Overview of Query Optimization in Relational Systems[J]. Rods, 1998, 27(3):34-43
[19] Bouganim L, Florescu D, Valduriez P. Dynamic Load Balancing in Hierarchical Parallel Database Systems[C]//International Conference on Very Larghe Data Bases, 1996:436-447
[20] Das D, Yan J, Zait M, et al. Query Optimization in Oracle 12c Database In-Memory[J]. Proceedings of the VLDB Endowment, 2015, 8(12):1770-1781
[21] Teimouri M, Rezakhah S, Mohammadpour A. U-Statistic for Multivariate Stable Distributions[J]. Journal of Probability and Statistics, 2017(1):1-12
[22] TaoBao. Oceanbase[EB/OL] [2012-09-04]. http://code.taobao.org/p/Oceanbase/src.
[23] 阳振坤. Oceanbase关系数据库架构[J]. 华东师范大学学报, 2014, 2014(5):141-148 Yang Zhenkun. The Architecture of OceanBase Relational Database System[J]. Journal of East China Normal University, 2014, 2014(5):141-148(in Chinese)
[24] TPC. TPC-H.[2018-06-02] http://www.tpc.org/tpc_documents_curre-nt_versions/pdf/tpc-h_v2.17.2.pdf