论文:2020,Vol:38,Issue(1):209-215
引用本文:
高锦涛, 刘文洁, 李战怀. 一种面向分布式读写分离系统的数据同步策略[J]. 西北工业大学学报
GAO Jintao, LIU Wenjie, LI Zhanhuai. A Strategy of Data Synchronization in Distributed System with Read Separating from Write[J]. Northwestern polytechnical university

一种面向分布式读写分离系统的数据同步策略
高锦涛, 刘文洁, 李战怀
西北工业大学 计算机学院, 陕西 西安 710072
摘要:
读写分离是NewSQL数据库兼容传统关系型数据库以及NoSQL数据库各自优势的一种常用策略。这种架构下,基线数据被分割为多个分区分布存储于不同存储节点,更改数据存储于单个事务节点,为减轻事务节点压力以及提升查询效率,需要将更改数据定期同步到存储节点。当前策略以分区粒度进行数据同步,导致无更改数据的分区同样参与同步操作,消耗额外网络代价、本地IO代价、内存空间以及磁盘空间。为提升同步效率,降低空间消耗,提出一种细粒度数据同步策略,在原始分区之上建立细粒度逻辑分区,提供更精确的同步单位;引入更改感知策略,记录被更改的分区以及对应的更改数据;利用更改发布机制驱动同步的进行,限制参与同步的分区为发生改变的分区。在分布式读写分离系统Oceanbase上验证细粒度数据同步策略,结果表明其同步效率和空间占用量均优于其他策略。
关键词:    分布式数据库    读写分离    Oceanbase    数据同步    细粒度   
A Strategy of Data Synchronization in Distributed System with Read Separating from Write
GAO Jintao, LIU Wenjie, LI Zhanhuai
School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China
Abstract:
Read separating from write is a strategy that NewSQL adopts to incorporate the advantages of traditional relation database and NoSQL database. Under this architecture, baseline data is split into multiple partitions stored at distributed physical nodes, while delta data is stored at single transaction node. For reducing the pressure of transaction node and improving the query performance, delta data needs to be synchronized into storage nodes. The current strategies trigger the procedure of data synchronization per partition, meaning that unchanged partitions will also participate in data synchronization, which consumes extra network cost, local IO and space resources. For improving the efficiency of data synchronization meanwhile mitigating space utilization, the fine-grained data synchronization strategy is proposed, whose main idea includes that fine-grained logical partitions upon original coarse-grained partitions is established, providing more correct synchronized unit; the delta data sensing strategy is introduced, which records the mapping between changed partitions and its delta data; instead of partition driven, the data synchronization through the delta-broadcasting mechanism is driven, constraining that only changed partitions can participate in data synchronization. The fine-grained data synchronization strategy on Oceanbase is implemented, which is a distributed database with read separating from write, and the results show that our strategy is better than other strategies in efficiency of data synchronizing and space utilization.
Key words:    distributed database    read separating from write    oceanbase    data synchronization    fine granularity   
收稿日期: 2019-02-26     修回日期:
DOI: 10.1051/jnwpu/20203810209
基金项目: 科技部重点研发项目(2018YFB1003403)与国家自然科学基金重点项目(61732014,61672432,61672434)资助
通讯作者:     Email:
作者简介: 高锦涛(1986-),西北工业大学博士研究生,主要从事数据库查询优化以及分布式研究。
相关功能
PDF(1297KB) Free
打印本文
把本文推荐给朋友
作者相关文章
高锦涛  在本刊中的所有文章
刘文洁  在本刊中的所有文章
李战怀  在本刊中的所有文章

参考文献:
[1] MONIRUZZAMAN A B M. NewSQL:Towards Next-Generation Scalable RDBMS for Online Transaction Processing(OLTP) for Big Data Management[J]. Computer Science, 2014, 7(6):121-130
[2] CHEN J, JINDEL S, WALZER R, et al. The MemSQL Query Optimizer:a Modern Optimizer for Real-Time Analytics in a Distributed Database[J]. Proceedings of the VLDB Endowment, 2016, 9(13):1401-1412
[3] VOLT D B. VoltDB[EB/OL].(2010-04-11)[2019-01-05]. https://www.voltdb.com/
[4] 阳振坤. Oceanbase关系数据库架构[J]. 华东师范大学学报, 2014(5):141-148 YANG Zhenkun. The Architecture of Oceanbase Relational Database System[J]. Journal of East China Normal University, 2014(5):141-148(in Chinese)
[5] LIN X. System, Method and Database Proxy Server for Separating Operations of Read and Write[P]. USA, Patent Application 15/015,911
[6] Google. LevelDB[EB/OL].(2011-05-01)[2019-01-13]. https://github.com/Level/levelup/blob/master/README.md
[7] The PostgreSQL Globsal Development Group. PostgreSQL[EB/OL]. (1995-05-01)[2019-01-13]. https://www.postgresql.org/
[8] MySQL. MySQL[EB/OL]. (2018-01-30)[2019-01-13]. https://github.com/mysql/mysql-server/releases/tag/mysql-cluster-7.6.12
[9] IBM. DB2[EB/OL]. (2009-06-30)[2019-01-13]. https://public.dhe.ibm.com/software/hk/cobral/
[10] CHANG F, DEAN J, Ghemawat S, et al. Bigtable:a Distributed Storage System for Structured Data[J]. ACM Trans on Computer Systems, 2008, 26(2):1-26
[11] Apache. HBase[EB/OL]. (2018-06-24)[2019-01-05]. http://www-eu.apache.org/dist/hbase/stable
[12] FRIDMAN L, BROWN D E, ANGELL W, et al. Automated Synchronization of Driving Data Using Vibration and Steering Events[J]. Pattern Recognition Letters, 2016, 75:9-15
[13] DAR S A, IQRA J. Synchronization of Data Between SQLite(Local Database) and SQL Server(Remote Database)[J]. IUP Journal of Computer Sciences, 2016, 10(4):7
[14] TPC. TPC-H[EB/OL]. (2018-06-02)[2018-09-17]. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.2.pdf
[15] ZHOU L, CHEN Y, LI T, et al. The Semi-Join Query Optimization in Distributed Database System[C]//National Conference on Information Technology & Computer Science, 2012:606-609
[16] GE C, LI S, YEBAI L I. Research on Query Optimization Technology in Distributed System Based on Semi-Join[J]. Computer & Modernization, 2011, 12(12):106-108
[17] ARAKI Y, ISHIZUKA S. Data Synchronization System and Data Synchronization Method[P]. USA, Patent 8,775,374, 2014-7-8
[18] ZHANG D, BACLAWSKI K P, TSOTRAS V J. B+Tree[J]. Encyclopedia of Database Systems, 2009, 288(22):15537-15546
[19] MITCHELL C, MONTGOMERY K, NELSON L, et al. Balancing {CPU} and Network in the Cell Distributed B-Tree Store[C]//2016 Annual Technical Conference, 2016:451-464
相关文献:
1.刘文洁, 陈震, 李战怀.一种面向海量分布式数据库的游标构造方法[J]. 西北工业大学学报, 2017,35(4): 718-723