Cache calls #159

mpadge · 2018-11-26T09:56:00Z

Instead of repeatedly bombing the overpass API, it'd be pretty easy to implement a local caching system that would record the call and store the pre-processed data returned from the API. Subsequent calls would then just re-load the local data and deliver anew.

The R.cache package has a hard-coded default that only allows enduring storage in "~/.Rcache/", used in the .onLoad call. This package sticks a few things in options(), but does not use any environmental variables.

A bit more flexibility could be added here via environmental variables, by defaulting to ~/.Rosmdata (or maybe piggyback on ~/.Rcache if it exists?), but allowing override if Sys.getenv("OSMDATA_CACHE_DIR") exists.

cache duration

Because OSM is constantly updated, it will be important to allow control over cache duration, so that local versions will be automatically updated at some stage. While this could also be handled via an environmental variable, "OSMDATA_CACHE_DURATION", that would need to be explicitly set by a user to work, so would impose additional burdens.

A less burdensome option would be an equivalent function parameter, which would best be placed in overpass_query(), because it's the overpass calls themselves that will actually be cached. Problem there is that that function is not exported. The general workflow is

opq() %>%
    add_osm_feature() %>%
    osmdata_sf/sp/sc/xml/pbf()

A cache_duration parameter could potentially be set in the initial opq() call, but that does not contain the full overpass query, and so this parameter would then need to be passed on to any and all subsequent functions. That suggests that the end-point calls are the best place for such a parameter. These currently only have 2 primary parameters (q, doc), so wouldn't suffer from an additional one there. If that is the point at which caching is determined, then it will likely be better to cache the full processed result, rather than just the direct result of the API call. The call itself could be digest-ed, while the cached object would be the final processed end-point. The timestamp could simply be read (file.info()$mtime), and the cache updated if difftime(...) > cache_duration, otherwise just re-load cached version.

The text was updated successfully, but these errors were encountered:

mpadge added this to todo 0.1.1 in v0.1 Nov 26, 2018

mpadge pinned this issue Oct 29, 2019

agila5 mentioned this issue Aug 24, 2020

A better approach to check if we need to download the data ropensci/osmextract#119

Open

mpadge added the enhancement label Oct 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache calls #159

Cache calls #159

mpadge commented Nov 26, 2018

Cache calls #159

Cache calls #159

Comments

mpadge commented Nov 26, 2018

cache duration