Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you choose a sample size for the cache ? #6

Open
abdulmi opened this issue Jan 31, 2023 · 3 comments
Open

How do you choose a sample size for the cache ? #6

abdulmi opened this issue Jan 31, 2023 · 3 comments

Comments

@abdulmi
Copy link

abdulmi commented Jan 31, 2023

Is there a strategy for choosing the sample size for the cache ? Based on the example provided its the sum of all windows, but just wanted to check if there's a suggested way to choose the sample size, and what are the implications for choosing a bigger or smaller sample size ?

The parameter I am talking about is the one passed to the WTinyLFUCache::with_sizes function

pub fn with_sizes(
    window_cache_size: usize,
    protected_cache_size: usize,
    probationary_cache_size: usize,
    samples: usize
) -> Result<Self, WTinyLFUError>
@al8n
Copy link
Owner

al8n commented Oct 18, 2023

Hi, sorry for the late reply, for the size, highly depends on the content that will be stored. So, unfortunately, there are no suggestions here.

@vlovich
Copy link
Contributor

vlovich commented Oct 29, 2023

I had a similar question and I think the root is that it's unclear how to connect the size sample parameters into an estimate of total memory usage & vice-versa (& just in general what those parameters control). More broadly, it's not even really clear what the size & sample values control. AFAICT:

  • Number of elements in the LRU (window_cache_size)
  • Number of elements in the segmented cache (probationary_cache_size protected_cache_size)
  • Number of elements in the CountMinSketch (somewhere between 2 * (window_cache_size probationary_cache_size protected_cache_size) and 4 * (window_cache_size probationary_cache_size protected_cache_size) if I'm not mistaken because there's 4 filters of length size rounded down to the nearest power of 2 approximately).
  • Number of bits within the bloom filter (derived from samples & false-positive rate although not 100% clear immediately what that relationship is because the logic isn't trivial and there's no helper docs to guide & I'm not doing more than skimming the implementation)

Is that correct? Is there a more elegant way to convert human-friendly requirements (e.g. "I want to use at most 100 MiB of space with a false-positive rate of 1%" and maybe something like "here's the percentage breakdown among the different components within the LFU") into the actual parameters needed?

@vlovich
Copy link
Contributor

vlovich commented Oct 31, 2023

@al8n can you help provide some color commentary on what the different parameters control & how they relate to memory usage / cache hit ratio? I'm not sure I fully understand how to convert target memory usage requirements into the parameters for the cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants