[draft] Cache collection #92

shuding · 2019-11-07T03:12:46Z

This PR adds inactive (LRU based) cache collection implementation.
By default <=600 inactive keys will be stored.

However, there're a couple of things we need to explore further before landing this feature.

Performance

For most (~99%) websites, the number of requests per page won't be more than 200. [1]
So this feature will increase the unnecessary workload by a lot.

A solution can be just skip handling cache collection until its size is near ~3/4 of the maximum allowed cache size.

API

How does it work with:

multilayer cache
namespaced cache

How to customize the cache size?

rostero1 · 2019-11-11T21:15:56Z

This is probably a little different than what's trying to be achieved with a LRU, but I really like how react-query manages stale data with a duration.

Demo:
https://codesandbox.io/s/optimistic-feynman-ox9eo

I wanted to share this in case it could be useful in coming up with a design for a flexible caching system.

shuding · 2019-11-11T23:26:43Z

@rostero1 it is a LRU (least recently used) model, but size based (not duration based). The reason is that I think this implementation should be more efficient after reading the implementation of these libs:

The goal of this lib is to be as lightweight and fast as possible, and like I mentioned in this PR, most users don't even need the LRU:

For most (~99%) websites, the number of requests per page won't be more than 200. [1]
So this feature will add too many unnecessary workload.

And since hooks are being called very frequently in the UI, cache collection especially a duration based cache collection process will slow down the rendering a lot. That's why I kept this PR on hold and want to maximize its performance before landing it.

So I'm curious about the real world use case of managing stale data with a duration. Because after a long time, what you should do for the cache is revalidation, not expiration.

rostero1 · 2019-11-12T16:15:02Z

Thanks, @quietshu. I'm not sure if this conversation should live somewhere else, but I have some data that appears as a dropdown list in multiple areas; for example, in the left side bar, and then in the right sidebar, deeply down in a few tabs. Note that because of privacy I cannot use any HTTP caching for my REST APIs.

After thinking about this some more, I find that I'm using staleTime mostly to prevent refetching the data for the entire duration the app is open (or until a root component unmounts where I can reset the cache). Maybe I could just achieve this by doing something like:

const fetchMemo = Memoizer.memo(fetch); // don't memo errors
const MyComponent = () => {
  const { data } = useSWR('/api/data', fetchMemo)
  // ...
}

For the app I'm working on I was trying to come up with a scenario where I actually need to refetch the data after a staleTime and I cannot think of one. I do have a place with a "refresh data" button, but I like that it's explicit.

sergiodxa · 2020-02-26T18:24:37Z

@quietshu have you thought more about this? I think it could be implemented manually per project using subscribe and custom hooks but maybe it could be integrated directly in SWR, do you mind if I work on this?

tannerlinsley · 2020-03-24T22:58:11Z

@rostero1 it is a LRU (least recently used) model, but size based (not duration based). The reason is that I think this implementation should be more efficient after reading the implementation of these libs:

https://github.com/sindresorhus/quick-lru

https://github.com/rsms/js-lru

The goal of this lib is to be as lightweight and fast as possible, and like I mentioned in this PR, most users don't even need the LRU:

For most (~99%) websites, the number of requests per page won't be more than 200. [1]
So this feature will add too many unnecessary workload.

And since hooks are being called very frequently in the UI, cache collection especially a duration based cache collection process will slow down the rendering a lot. That's why I kept this PR on hold and want to maximize its performance before landing it.

So I'm curious about the real world use case of managing stale data with a duration. Because after a long time, what you should do for the cache is revalidation, not expiration.

I'd like to add some more clarification on the duration-based cache expiration in React Query for you guys to make a better decision here. Hopefully it helps.

Duration based expiration on it's own is useless, since you would never want to expire something in the cache if it was currently in use, and likewise, if you know that a query result is never going to be used again or very infrequently, you don't want to keep it around forever. However, duration based caching is actually a very good approach to UI and how users interact with it. Users are visiting different parts of your app more than other parts and those frequencies may vary drastically in timing, but have a very low cardinality (like you said, many apps don't have a lot of unique query cache items). So to combat this in React Query, I track the "active"-ness of a query by how many hooks instances at any given time are using a query cache value. That cache can still become stale (staleTime), and get revalidated, but the moment that there are 0 active instances on the page for a given query cache item, it gets marked for garbage collection after cacheTimems. If an instance for that query appears on the page before the cache time is up, then the garbage collection timeout is cleared, the query cache is used to immediately display the data and things move on as normal. If an instance doesn't appear for that query cache item in the cache time, then it finally gets removed from the cache.

With that explained, you can see how a duration based caching system could easily get out of hand if you were firing off thousands upon thousands of unique queries with different query keys and they took a while to expire, BUT, like you said, most apps (even larger enterprise-sized apps) don't use that many unique queries (maybe in the hundreds over a long period of time).

I researched this quite a bit when I was building React Query and IMO, you guys should go the direction of a duration based cache system, not an LRU for similar reasons. But hey, that's just my two cents! Good luck guys!

shuding · 2020-03-26T18:32:19Z

Thank you @tannerlinsley for your great suggestion!

I track the "active"-ness of a query by how many hooks instances at any given time are using a query cache value. That cache can still become stale (staleTime), and get revalidated, but the moment that there are 0 active instances on the page for a given query cache item, it gets marked for garbage collection after cacheTimems.

First of all this LRU implementation tracks the active hook instances as well :D
It just throws away all the inactive data into an LRU. But I agree that duration based cache is closer to a real world application logic and how users interact with it, like you explained.

For sure I can add a timestamp to each item in this LRU cache, to make it duration based. It would be just 2 or 3 lines of code, but my biggest concern is still the performance. Like I commented a while ago:

The reason is that I think this implementation should be more efficient after reading the implementation of these libs: [...]
The goal of this lib is to be as lightweight and fast as possible

I'll do some benchmark and test it with real world scenarios to see:

do we really need cache collection (IMO we need to disable it by default)
what will the performance diff be if we enable this

Thanks again! Cheers!

wooki-kim · 2023-05-22T03:03:51Z

"requestIdleCallback" method is not supported in Safari. Is it okay to use it? Does SWR core use that method? If so, was it used with a polyfill?

msdrigg · 2023-08-17T21:54:58Z

SWR used to use requestIdleCallback but replaced it with requestAnimationFrame in #731

benevbright · 2023-11-25T23:05:01Z

One thing that holds SWR from moving forward is that the cache doesn't have a timestamp that tells when it's saved. Without it, some feature like staleTime is impossible to implement.

Let me know if there is some workaround. @shuding

shuding added 2 commits November 7, 2019 09:10

create cache.ts

aeaefa6

add cache collection

a855fb3

shuding added feature request New feature or request on hold labels Nov 7, 2019

fix window

2299ece

shuding mentioned this pull request Nov 29, 2019

Have a way to clear cache #161

Closed

shuding changed the title ~~Cache collection~~ [draft] Cache collection Dec 4, 2019

sergiodxa mentioned this pull request Dec 23, 2019

[RFC] Custom Cache Support #212

Closed

slightlytyler mentioned this pull request Feb 1, 2020

Change suspense handling to allow changes in config.suspense #248

Closed

This was referenced May 1, 2020

Allow cache serialization and subscribers to know the updated key and type of event #365

Closed

Add support to detect inactive keys and purge #366

Closed

shuding mentioned this pull request Aug 28, 2021

Remove old cached items #4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[draft] Cache collection #92

[draft] Cache collection #92

shuding commented Nov 7, 2019 •

edited

Loading

rostero1 commented Nov 11, 2019

shuding commented Nov 11, 2019

rostero1 commented Nov 12, 2019

sergiodxa commented Feb 26, 2020

tannerlinsley commented Mar 24, 2020

shuding commented Mar 26, 2020

wooki-kim commented May 22, 2023

msdrigg commented Aug 17, 2023

benevbright commented Nov 25, 2023 •

edited

Loading

[draft] Cache collection #92

Are you sure you want to change the base?

[draft] Cache collection #92

Conversation

shuding commented Nov 7, 2019 • edited Loading

Performance

API

rostero1 commented Nov 11, 2019

shuding commented Nov 11, 2019

rostero1 commented Nov 12, 2019

sergiodxa commented Feb 26, 2020

tannerlinsley commented Mar 24, 2020

shuding commented Mar 26, 2020

wooki-kim commented May 22, 2023

msdrigg commented Aug 17, 2023

benevbright commented Nov 25, 2023 • edited Loading

shuding commented Nov 7, 2019 •

edited

Loading

benevbright commented Nov 25, 2023 •

edited

Loading