Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[draft] Cache collection #92

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

[draft] Cache collection #92

wants to merge 3 commits into from

Conversation

shuding
Copy link
Member

@shuding shuding commented Nov 7, 2019

This PR adds inactive (LRU based) cache collection implementation.
By default <=600 inactive keys will be stored.

However, there're a couple of things we need to explore further before landing this feature.

Performance

For most (~99%) websites, the number of requests per page won't be more than 200. [1]
So this feature will increase the unnecessary workload by a lot.

A solution can be just skip handling cache collection until its size is near ~3/4 of the maximum allowed cache size.

API

How does it work with:

  • multilayer cache
  • namespaced cache

How to customize the cache size?

@shuding shuding added feature request New feature or request on hold labels Nov 7, 2019
@rostero1
Copy link

This is probably a little different than what's trying to be achieved with a LRU, but I really like how react-query manages stale data with a duration.

Demo:
https://codesandbox.io/s/optimistic-feynman-ox9eo

I wanted to share this in case it could be useful in coming up with a design for a flexible caching system.

@shuding
Copy link
Member Author

shuding commented Nov 11, 2019

@rostero1 it is a LRU (least recently used) model, but size based (not duration based). The reason is that I think this implementation should be more efficient after reading the implementation of these libs:

The goal of this lib is to be as lightweight and fast as possible, and like I mentioned in this PR, most users don't even need the LRU:

For most (~99%) websites, the number of requests per page won't be more than 200. [1]
So this feature will add too many unnecessary workload.

And since hooks are being called very frequently in the UI, cache collection especially a duration based cache collection process will slow down the rendering a lot. That's why I kept this PR on hold and want to maximize its performance before landing it.

So I'm curious about the real world use case of managing stale data with a duration. Because after a long time, what you should do for the cache is revalidation, not expiration.

@rostero1
Copy link

Thanks, @quietshu. I'm not sure if this conversation should live somewhere else, but I have some data that appears as a dropdown list in multiple areas; for example, in the left side bar, and then in the right sidebar, deeply down in a few tabs. Note that because of privacy I cannot use any HTTP caching for my REST APIs.

After thinking about this some more, I find that I'm using staleTime mostly to prevent refetching the data for the entire duration the app is open (or until a root component unmounts where I can reset the cache). Maybe I could just achieve this by doing something like:

const fetchMemo = Memoizer.memo(fetch); // don't memo errors
const MyComponent = () => {
  const { data } = useSWR('/api/data', fetchMemo)
  // ...
}

For the app I'm working on I was trying to come up with a scenario where I actually need to refetch the data after a staleTime and I cannot think of one. I do have a place with a "refresh data" button, but I like that it's explicit.

@shuding shuding changed the title Cache collection [draft] Cache collection Dec 4, 2019
@sergiodxa
Copy link
Contributor

@quietshu have you thought more about this? I think it could be implemented manually per project using subscribe and custom hooks but maybe it could be integrated directly in SWR, do you mind if I work on this?

@tannerlinsley
Copy link

@rostero1 it is a LRU (least recently used) model, but size based (not duration based). The reason is that I think this implementation should be more efficient after reading the implementation of these libs:

The goal of this lib is to be as lightweight and fast as possible, and like I mentioned in this PR, most users don't even need the LRU:

For most (~99%) websites, the number of requests per page won't be more than 200. [1]
So this feature will add too many unnecessary workload.

And since hooks are being called very frequently in the UI, cache collection especially a duration based cache collection process will slow down the rendering a lot. That's why I kept this PR on hold and want to maximize its performance before landing it.

So I'm curious about the real world use case of managing stale data with a duration. Because after a long time, what you should do for the cache is revalidation, not expiration.

I'd like to add some more clarification on the duration-based cache expiration in React Query for you guys to make a better decision here. Hopefully it helps.

Duration based expiration on it's own is useless, since you would never want to expire something in the cache if it was currently in use, and likewise, if you know that a query result is never going to be used again or very infrequently, you don't want to keep it around forever. However, duration based caching is actually a very good approach to UI and how users interact with it. Users are visiting different parts of your app more than other parts and those frequencies may vary drastically in timing, but have a very low cardinality (like you said, many apps don't have a lot of unique query cache items). So to combat this in React Query, I track the "active"-ness of a query by how many hooks instances at any given time are using a query cache value. That cache can still become stale (staleTime), and get revalidated, but the moment that there are 0 active instances on the page for a given query cache item, it gets marked for garbage collection after cacheTimems. If an instance for that query appears on the page before the cache time is up, then the garbage collection timeout is cleared, the query cache is used to immediately display the data and things move on as normal. If an instance doesn't appear for that query cache item in the cache time, then it finally gets removed from the cache.

With that explained, you can see how a duration based caching system could easily get out of hand if you were firing off thousands upon thousands of unique queries with different query keys and they took a while to expire, BUT, like you said, most apps (even larger enterprise-sized apps) don't use that many unique queries (maybe in the hundreds over a long period of time).

I researched this quite a bit when I was building React Query and IMO, you guys should go the direction of a duration based cache system, not an LRU for similar reasons. But hey, that's just my two cents! Good luck guys!

@shuding
Copy link
Member Author

shuding commented Mar 26, 2020

Thank you @tannerlinsley for your great suggestion!

I track the "active"-ness of a query by how many hooks instances at any given time are using a query cache value. That cache can still become stale (staleTime), and get revalidated, but the moment that there are 0 active instances on the page for a given query cache item, it gets marked for garbage collection after cacheTimems.

First of all this LRU implementation tracks the active hook instances as well :D
It just throws away all the inactive data into an LRU. But I agree that duration based cache is closer to a real world application logic and how users interact with it, like you explained.

For sure I can add a timestamp to each item in this LRU cache, to make it duration based. It would be just 2 or 3 lines of code, but my biggest concern is still the performance. Like I commented a while ago:

The reason is that I think this implementation should be more efficient after reading the implementation of these libs: [...]
The goal of this lib is to be as lightweight and fast as possible

I'll do some benchmark and test it with real world scenarios to see:

  • do we really need cache collection (IMO we need to disable it by default)
  • what will the performance diff be if we enable this

Thanks again! Cheers!

@wooki-kim
Copy link

"requestIdleCallback" method is not supported in Safari. Is it okay to use it? Does SWR core use that method? If so, was it used with a polyfill?

@msdrigg
Copy link

msdrigg commented Aug 17, 2023

SWR used to use requestIdleCallback but replaced it with requestAnimationFrame in #731

@benevbright
Copy link

benevbright commented Nov 25, 2023

One thing that holds SWR from moving forward is that the cache doesn't have a timestamp that tells when it's saved. Without it, some feature like staleTime is impossible to implement.

Let me know if there is some workaround. @shuding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request on hold
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants