Log-structured merge-tree

Log-structured merge-tree
Log-structured merge-tree
Type	Hybrid (two tree-like components)
Invented	1996
Invented by	Patrick O'Neil, Edward Cheng, Dieter Gawlick, Elizabeth O'Neil
Operation
Time complexity in big O notation
Operation	Average
Insert	O(1) (amortised)
Find-min	O(N)
Delete-min	O(N)
Space complexity

In computer science, the log-structured merge-tree (also known as LSM tree, or LSMT^[1]) is a data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data. LSM trees, like other search trees, maintain key-value pairs. LSM trees maintain data in two or more separate structures, each of which is optimized for its respective underlying storage medium; data is synchronized between the two structures efficiently, in batches.

One simple version of the LSM tree is a two-level LSM tree.^[2] As described by Patrick O'Neil, a two-level LSM tree comprises two tree-like structures, called C₀ and C₁. C₀ is smaller and entirely resident in memory, whereas C₁ is resident on disk. New records are inserted into the memory-resident C₀ component. If the insertion causes the C₀ component to exceed a certain size threshold, a contiguous segment of entries is removed from C₀ and merged into C₁ on disk. The performance characteristics of LSM trees stem from the fact that each component is tuned to the characteristics of its underlying storage medium, and that data is efficiently migrated across media in rolling batches, using an algorithm reminiscent of merge sort. Such tuning involves writing data in a sequential manner as opposed to as a series of separate random access requests. This optimization reduces seek time in hard-disk drives (HDDs) and latency in solid-state drives (SSDs).

Most LSM trees used in practice employ multiple levels. Level 0 is kept in main memory, and might be represented using a tree. The on-disk data is organized into sorted runs of data. Each run contains data sorted by the index key. A run can be represented on disk as a single file, or alternatively as a collection of files with non-overlapping key ranges. To perform a query on a particular key to get its associated value, one must search in the Level 0 tree and also each run. The Stepped-Merge version of the LSM tree^[3] is a variant of the LSM tree that supports multiple levels with multiple tree structures at each level.

A particular key may appear in several runs, and what that means for a query depends on the application. Some applications simply want the newest key-value pair with a given key. Some applications must combine the values in some way to get the proper aggregate value to return. For example, in Apache Cassandra, each value represents a row in a database, and different versions of the row may have different sets of columns.^[4]

In order to keep down the cost of queries, the system must avoid a situation where there are too many runs.

Extensions to the 'leveled' method to incorporate B tree structures have been suggested, for example bLSM^[5] and Diff-Index.^[6] LSM-tree was originally designed for write-intensive workloads. As increasingly more read and write workloads co-exist under an LSM-tree storage structure, read data accesses can experience high latency and low throughput due to frequent invalidations of cached data in buffer caches by LSM-tree compaction operations. To re-enable effective buffer caching for fast data accesses, a Log-Structured buffered-Merged tree (LSbM-tree) is proposed and implemented.^[7]

References

^ Zhang, Weitao; Xu, Yinlong; Li, Yongkun; Li, Dinglong (December 2016). "Improving Write Performance of LSMT-Based Key-Value Store". 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS). pp. 553–560. doi:10.1109/ICPADS.2016.0079. ISBN 978-1-5090-4457-3. S2CID 13611447.
^ O’Neil, Patrick; Cheng, Edward; Gawlick, Dieter; O’Neil, Elizabeth (1996-06-01). "The log-structured merge-tree (LSM-tree)" (PDF). Acta Informatica. 33 (4): 351–385. doi:10.1007/s002360050048. ISSN 1432-0525. S2CID 12627452.
^ Jagadish, H.V.; Narayan, P.P.S.; Seshadri, S.; Sudarshan, S.; Kanneganti, Rama (1997). "Incremental Organization for Data Recording and Warehousing" (PDF). Proceedings of the VLDB Conference. VLDB Foundation: 16–25.
^ "Leveled Compaction in Apache Cassandra : DataStax". February 13, 2014. Archived from the original on February 13, 2014.
^ Sears, Russell; Ramakrishnan, Raghu (2012-05-20). "BLSM". Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. SIGMOD '12. New York, NY, USA: Association for Computing Machinery. pp. 217–228. doi:10.1145/2213836.2213862. ISBN 978-1-4503-1247-9. S2CID 207194816.
^ Tan, Wei; Tata, Sandeep; Tang, Yuzhe; Fong, Liana (2014), Diff-Index: Differentiated Index in Distributed Log-Structured Data Stores (PDF), OpenProceedings.org, doi:10.5441/002/edbt.2014.76, retrieved 2022-05-22
^ Dejun Teng; Lei Guo; Rubao Lee; Feng Chen; Yanfeng Zhang; Siyuan Ma; Xiaodong Zhang (2018). "A Low-cost Disk Solution Enabling LSM-tree to Achieve High Performance for Mixed Read/Write Workloads". ACM Transactions on Storage. pp. 1–26. doi:10.1145/3162615.

General

O'Neil, Patrick E.; Cheng, Edward; Gawlick, Dieter; O'Neil, Elizabeth (June 1996). "The log-structured merge-tree (LSM-tree)". Acta Informatica. 33 (4): 351–385. CiteSeerX 10.1.1.44.2782. doi:10.1007/s002360050048. S2CID 12627452.
Li, Yinan; He, Bingsheng; Luo, Qiong; Yi, Ke (2009). "Tree Indexing on Flash Disks". 2009 IEEE 25th International Conference on Data Engineering. pp. 1303–6. CiteSeerX 10.1.1.144.6961. doi:10.1109/ICDE.2009.226. ISBN 978-1-4244-3422-0. S2CID 2343303.
Luo, Chen; Carey, Michael J. (July 2019). "LSM-based storage techniques: a survey". The VLDB Journal. 29: 393–418. arXiv:1812.07527. doi:10.1007/s00778-019-00555-y. S2CID 56178614.

External links

An Overview of Log Structured Merge Trees

[1] Zhang, Weitao; Xu, Yinlong; Li, Yongkun; Li, Dinglong (December 2016). "Improving Write Performance of LSMT-Based Key-Value Store". 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS). pp. 553–560. doi:10.1109/ICPADS.2016.0079. ISBN 978-1-5090-4457-3. S2CID 13611447.

[2] O’Neil, Patrick; Cheng, Edward; Gawlick, Dieter; O’Neil, Elizabeth (1996-06-01). "The log-structured merge-tree (LSM-tree)" (PDF). Acta Informatica. 33 (4): 351–385. doi:10.1007/s002360050048. ISSN 1432-0525. S2CID 12627452.

[3] Jagadish, H.V.; Narayan, P.P.S.; Seshadri, S.; Sudarshan, S.; Kanneganti, Rama (1997). "Incremental Organization for Data Recording and Warehousing" (PDF). Proceedings of the VLDB Conference. VLDB Foundation: 16–25.

[4] "Leveled Compaction in Apache Cassandra : DataStax". February 13, 2014. Archived from the original on February 13, 2014.

[5] Sears, Russell; Ramakrishnan, Raghu (2012-05-20). "BLSM". Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. SIGMOD '12. New York, NY, USA: Association for Computing Machinery. pp. 217–228. doi:10.1145/2213836.2213862. ISBN 978-1-4503-1247-9. S2CID 207194816.

[6] Tan, Wei; Tata, Sandeep; Tang, Yuzhe; Fong, Liana (2014), Diff-Index: Differentiated Index in Distributed Log-Structured Data Stores (PDF), OpenProceedings.org, doi:10.5441/002/edbt.2014.76, retrieved 2022-05-22

[7] Dejun Teng; Lei Guo; Rubao Lee; Feng Chen; Yanfeng Zhang; Siyuan Ma; Xiaodong Zhang (2018). "A Low-cost Disk Solution Enabling LSM-tree to Achieve High Performance for Mixed Read/Write Workloads". ACM Transactions on Storage. pp. 1–26. doi:10.1145/3162615.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

v t e Tree data structures
Search trees (dynamic sets/associative arrays)	2–3 2–3–4 AA (a,b) AVL B B B* B^x (Optimal) Binary search Dancing HTree Interval Order statistic Palindrome (Left-leaning) Red–black Scapegoat Splay T Treap UB Weight-balanced
Heaps	Binary Binomial Brodal d-ary Fibonacci Leftist Pairing Skew binomial Skew van Emde Boas Weak
Tries	Ctrie C-trie (compressed ADT) Hash Radix Suffix Ternary search X-fast Y-fast
Spatial data partitioning trees	Ball BK BSP Cartesian Hilbert R k-d (implicit k-d) M Metric MVP Octree PH Priority R Quad R R R* Segment VP X
Other trees	Cover Exponential Fenwick Finger Fractal tree index Fusion Hash calendar iDistance K-ary Left-child right-sibling Link/cut Log-structured merge Merkle PQ Range SPQR Top