Pipelined Architectures for Latency Aware Caching With Delayed Hits

Simulator in caffeine library

Abstract

Modern computer systems use caching to speed up access to data and make better use of resources. These systems, especially in storage and cloud environments, are diverse, with different types of storage, locations, and caches, leading to inconsistent access times. Applications running on these systems also have varying access patterns, like how often they access data or if they do so in bursts. To handle this complexity, cache systems use complex algorithms that aim to balance conflicting priorities. However, these algorithms are hard to expand, debug, and integrate into software.

Our research introduces a flexible architecture that represents complex algorithms as a series of simple policies. This architecture automatically adjusts the different stages of the policy pipeline to maximize the system's performance for the current workload. This means we can add new strategies while preserving the old ones and only use them when they provide value. We've also developed a new cache algorithm called Burst Cache, which can identify and keep items that come in bursts for a long time, even if they aren't recent or frequent. This is particularly useful for handling brief bursts of data that arrive faster than the cache can process them, leading to delays and increased request times. Integrating this strategy into a cache system without harming performance for non-bursty data is a challenge. Our proposed algorithm can select the best strategy based on the workload's characteristics.

Introduction

Caching is a crucial technique to speed up data retrieval by taking advantage of variations in access times among different types of memory and storage. Caches store a small portion of data in a faster storage medium. For instance, DRAM can act as a cache for SSDs, and SSDs can serve as a cache for HDDs. When you access data stored in a cache, it's called a cache hit, while accessing data not in the cache is a cache miss. Cache hits are generally faster because the data is fetched from a quicker storage medium.

Traditionally, cache algorithms aimed to maximize the hit ratio, which is the percentage of accesses served faster by the cache. This works well because many workloads exhibit patterns like recency (recently accessed items are likely to be accessed again) and frequency (items accessed frequently over time are likely to be accessed again). However, in practice, it's challenging to predict which heuristic will be most beneficial. Therefore, adaptive caches often blend frequency and recency and dynamically allocate resources to each heuristic based on the workload's characteristics. In situations where access times are not consistent and especially when the times it takes to retrieve data that's not in cache (miss times) vary, it's important to note that maximizing the cache hit ratio doesn't necessarily mean minimizing the overall access time. This is particularly relevant in various real-world applications like search engines, data storage systems, mobile apps, DNS, web access, and more. Non-uniform access times typically occur when a storage system is distributed or employs different storage mediums. Additionally, Non-uniform Memory Access (NUMA) systems can result in different access times to various parts of the DRAM for different CPU cores.

When focusing on variable access times within the context of cost-aware caching, we observe an important distinction. Accessing an item while it's being added to the cache is neither a cache hit (because the item isn't in the cache) nor precisely a cache miss (because it's being fetched to the cache). These situations are known as delayed hits, common in storage systems with bursts of requests, web queries, DNS resolutions, or file retrievals. Unlike classical cache algorithms like LRU, which assume instant item admission, our approach aims to create a software-friendly delayed hit-aware cache using a pipelined adaptive architecture and dynamic sizing based on workload characteristics, allowing the incorporation of multiple policies and algorithms for improved performance.

In the context of cost-aware caching, each item in the cache is associated with a certain access cost. These costs can represent various factors such as variations in access times, monetary expenses, power consumption, and so on. The field of cost-aware caching focuses on developing efficient algorithms tailored to this cost-based model. The problem with ad-hoc caching algorithms is that they become outdated as new heuristics and approaches are introduced, and most caching research does not consider the cost-aware model. Consequently, valuable ideas from one field have limited influence on the development of ideas in other fields. Our contribution involves enhancing the design of latency-aware caches. We propose an adaptive cache management algorithm that incorporates three distinct factors: Recency, Frequency, and Burstiness, all while considering latency. We introduce Burstiness as a new criterion for caching policies, separate from Recency and Frequency.

Our solution operates as a pipeline of simple cache policies, where the output of one policy feeds into the next. To make it adaptive, we divide the cache into smaller segments and employ a novel ghost entries-based hill climber algorithm to move portions between cache algorithms, adjusting their emphasis on Recency, Frequency, or Burstiness. This adaptability allows us to introduce and fine-tune policies based on workload characteristics.

Through extensive simulations, we demonstrate that our algorithm surpasses various state-of-the-art alternatives, is insensitive to initial conditions, and competes favorably with static configurations, occasionally even outperforming the best static setups. This underscores the effectiveness of our adaptivity technique in identifying and leveraging subtle workload trends. Furthermore, our approach outperforms existing methods, even in scenarios where previous algorithms falter, thanks to our more robust use of ghost entries. While it requires additional metadata, the overhead for a small number of blocks is comparable to other works in the field.

policies

LRU (Least Recently Used) is a cache eviction algorithm that removes the least recently accessed item when the cache is full
LFU (Least Frequently Used) is a cache eviction algorithm that removes the least frequently accessed item when the cache is full.
First-In-First-Out (FIFO): In FIFO, the first item that was added to the cache is the first one to be removed when the cache reaches its limit. It follows a strict queue-like behavior.
Random Replacement: This policy selects a random item from the cache to evict when needed. It's simple but lacks any specific strategy.
Most Recently Used (MRU): MRU evicts the most recently accessed item when the cache is full. It assumes that the most recently accessed item is the most likely to be accessed again soon.
Adaptive Replacement Cache (ARC): ARC combines elements of both LRU and LFU to adapt to changing access patterns. It dynamically adjusts its behavior based on recent access history.
Two-Queue Algorithm (2Q): 2Q uses two separate queues, one for recently accessed items (MRU) and the other for frequently accessed items (LFU). It aims to provide a balance between these two aspects of caching.
Clock or Second-Chance Algorithm: This policy keeps a circular buffer and marks items as they are accessed. When an item needs to be evicted, it looks for the first unmarked (unused) item in the buffer.
LIRS (Low Inter-reference Recency Set): LIRS is designed to improve upon LRU by distinguishing between items with high and low inter-reference recency. It aims to provide better cache hit rates.
GDSF (Greedy-Dual Size Frequency): GDSF maintains two separate queues for items and prioritizes eviction based on both size and access frequency.

Important

Caffeine is a java rewrite of Guaca's cache and will supersede the Guava support in Spring Boot 2.0 . if Caffeine is present, a CaffeineCacheManager (provided by the spring-boot-starter-cache 'Starter') is auto-configured. Caches can be created on setup using the spring.cache.cache-names property and customized by one of the followong:
1. A cache spec defined by spring.cache.caffeine.spec
2. A com.github.benmanes.caffeine.cache.CaffeineSpec bean is defined
3. A com.github.benmanes.caffeine.cache.Caffeine bean is defined Below is the GitHub link to download source: https://github.com/kishanjavatrainer/SpringBootCaffeineCache
DRAM, which stands for Dynamic Random-Access Memory, is a type of volatile computer memory that is commonly used in computers and other electronic devices. It is a form of primary or main memory in a computer system and is responsible for temporarily storing data that the CPU (Central Processing Unit) actively uses during program execution. Regarding the statement "DRAM can be a cache for SSD," this means that DRAM memory can be used as a cache to enhance the performance of Solid-State Drives (SSDs). SSDs are a type of non-volatile storage that is faster than traditional hard disk drives (HDDs) but still not as fast as DRAM. By using DRAM as a cache for frequently accessed data from the SSD, it helps reduce the latency of data access and improves overall system performance. This caching mechanism allows frequently used data to be quickly retrieved from the high-speed DRAM memory, while less frequently accessed data is stored on the slower but higher-capacity SSD. It's a technique used to bridge the speed gap between DRAM and SSD, optimizing data access in a computer system.
Ad-hoc algorithms refer to algorithms that are specifically designed or tailored for a particular problem or situation without following a general, standardized approach. These algorithms are created on a case-by-case basis to address a specific, often unique, problem or set of conditions. They are typically not part of a broader, established algorithmic framework and are often improvised or custom-built to solve a particular problem at hand.

Tools

Spring Boot - is an open-source Java-based framework used for building production-ready, stand-alone, and web-based applications quickly and with minimal configuration. It is part of the larger Spring ecosystem and is designed to simplify the development of Spring applications by providing a set of pre-configured templates and conventions. Spring Boot makes it easier to create robust, scalable, and maintainable Java applications.

REFERENCES

REFERENCES for the policies

More reference use later mybe

https://java-design-patterns.com/patterns/caching/

Questions we are trying to understand about the library

if it is possible to obtain information about the latency
- Caffeine library primarily focuses on caching and doesn't directly provide features for measuring or obtaining information about latency. It is designed to efficiently manage in-memory caches and provide fast access to cached data. However, we can measure latency and performance indirectly by implementing your own timing and profiling mechanisms in your code. Here are a few approaches you can consider:
  - Instrumentation: add instrumentation code to the application to measure the time taken by specific operations, including cache access. For example, you can use Java's System.nanoTime() to record timestamps before and after cache access and calculate the time elapsed.
  - Profiling Tools: Use Java profiling tools like VisualVM or YourKit to analyze the performance of your application, including cache-related operations. These tools can help you identify bottlenecks and performance issues, which indirectly relate to latency.
  - Logging: Implement custom logging to record timestamps and relevant information for cache operations. This can help you analyze the time taken by cache-related activities.
  - External Monitoring: Use external monitoring tools and services that can provide insights into your application's performance, including latency measurements. Tools like Prometheus with Grafana or New Relic can be integrated into your application for performance monitoring. Remember that measuring latency can be influenced by various factors, including the hardware, operating system, and JVM configuration. It's essential to consider the specific context and requirements of your application when implementing latency measurement solutions.
Since software libraries like Caffeine primarily focus on their core functionality (in this case, caching), additional performance monitoring and latency measurement are typically handled at the application level using the approaches mentioned above. Keep in mind that newer versions of libraries and frameworks may introduce additional features, so it's a good practice to check the documentation or release notes for the latest updates and capabilities.

Caffeine is a high performance, near optimal caching library. For more details, see our user's guide and browse the API docs for the latest release.

Cache

Caffeine provides an in-memory cache using a Google Guava inspired API. The improvements draw on our experience designing Guava's cache and ConcurrentLinkedHashMap.

LoadingCache<Key, Graph> graphs = Caffeine.newBuilder()
    .maximumSize(10_000)
    .expireAfterWrite(Duration.ofMinutes(5))
    .refreshAfterWrite(Duration.ofMinutes(1))
    .build(key -> createExpensiveGraph(key));

Features at a Glance

Caffeine provides flexible construction to create a cache with a combination of the following optional features:

automatic loading of entries into the cache, optionally asynchronously
size-based eviction when a maximum is exceeded based on frequency and recency
time-based expiration of entries, measured since last access or last write
asynchronously refresh when the first stale request for an entry occurs
keys automatically wrapped in weak references
values automatically wrapped in weak or soft references
notification of evicted (or otherwise removed) entries
writes propagated to an external resource
accumulation of cache access statistics

In addition, Caffeine offers the following extensions:

Use Caffeine in a community provided integration:

Play Framework: High velocity web framework
Micronaut: A modern, full-stack framework
Spring Cache: As of Spring 4.3 & Boot 1.4
Quarkus: Supersonic Subatomic Java
Scaffeine: Scala wrapper for Caffeine
ScalaCache: Simple caching in Scala
Camel: Routing and mediation engine
JHipster: Generate, develop, deploy
Aedile: Kotlin wrapper for Caffeine

Powering infrastructure near you:

Dropwizard: Ops-friendly, high-performance, RESTful APIs
Cassandra: Manage massive amounts of data, fast
Coherence: Mission critical in-memory data grid
Accumulo: A sorted, distributed key/value store
Kafka: A distributed event streaming platform
HBase: A distributed, scalable, big data store
Apache Solr: Blazingly fast enterprise search
Infinispan: Distributed in-memory data grid
Redisson: Ultra-fast in-memory data grid
OpenWhisk: Serverless cloud platform
Corfu: A cluster consistency platform
Grails: Groovy-based web framework
Finagle: Extensible RPC system
Neo4j: Graphs for Everyone
Druid: Real-time analytics

In the News

An in-depth description of Caffeine's architecture.
- Design of a Modern Cache: part #1, part #2 (slides) at HighScalability
Caffeine is presented as part of research papers evaluating its novel eviction policy.
- TinyLFU: A Highly Efficient Cache Admission Policy by Gil Einziger, Roy Friedman, Ben Manes
- Adaptive Software Cache Management by Gil Einziger, Ohad Eytan, Roy Friedman, Ben Manes
- Lightweight Robust Size Aware Cache Management by Gil Einziger, Ohad Eytan, Roy Friedman, Ben Manes

Download

Download from Maven Central or depend via Gradle:

implementation("com.github.ben-manes.caffeine:caffeine:3.1.8")

// Optional extensions
implementation("com.github.ben-manes.caffeine:guava:3.1.8")
implementation("com.github.ben-manes.caffeine:jcache:3.1.8")

For Java 11 or above, use 3.x otherwise use 2.x.

See the release notes for details of the changes.

Snapshots of the development version are available in Sonatype's snapshots repository.

Name		Name	Last commit message	Last commit date
Latest commit History 1,922 Commits
.circleci		.circleci
.github		.github
caffeine		caffeine
examples		examples
gradle		gradle
guava		guava
jcache		jcache
simulator		simulator
wiki		wiki
.codecov.yml		.codecov.yml
.deepsource.toml		.deepsource.toml
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.java-version		.java-version
.typos.toml		.typos.toml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
jitpack.yml		jitpack.yml
qodana.yaml		qodana.yaml
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pipelined Architectures for Latency Aware Caching With Delayed Hits

Simulator in caffeine library

Abstract

Introduction

policies

Important

Tools

REFERENCES

REFERENCES for the policies

More reference use later mybe

Questions we are trying to understand about the library

Cache

Features at a Glance

In the News

Download

About

Releases

Packages

Languages

License

shahaf2284/caffeine

Folders and files

Latest commit

History

Repository files navigation

Pipelined Architectures for Latency Aware Caching With Delayed Hits

Simulator in caffeine library

Abstract

Introduction

policies

Important

Tools

REFERENCES

REFERENCES for the policies

More reference use later mybe

Questions we are trying to understand about the library

Cache

Features at a Glance

In the News

Download

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages