cpcache has a successor: flexo. Flexo is much more lightweight than cpcache, so please try it out. I will focus my efforts on flexo, so please don't expect any new functionality to be added to cpcache.
cpcache is a central cache for pacman, the package manager of Arch Linux. It requires little configuration, does not bother you with 404 errors and it allows you to utilize the full bandwidth even as multiple clients download the same file at the same time.
cpcache is an HTTP server that sits between pacman and the remote mirror. To serve HTTP GET requests, it either fetches the file from the local filesystem (using sendfile), or, if the file is not locally available, it establishes a connection to the remote mirror and downloads it from there. In this case, the file will be stored to the local filesystem and simultaneously streamed to the requesting client. This means the download will take just as long as it would have taken without cpcache, with the added bonus that subsequent HTTP requests for this file can henceforth be served from the local filesystem.
cpcache is entirely transparent to pacman: Nothing that concerns pacman needs configuration changes (except for changing the mirror, of course). No additional latency is introduced, downloads will start immediately regardless of whether they are served from a remote mirror or the local filesystem.
For database files, no caching is done. cpcache will send a redirect response instead.
Some examples where you might find cpcache useful:
- You have more than one device in your LAN that runs ArchLinux. Installing cpcache on the device that runs most often and changing the pacman mirror on all other devices allows you to download cached packages with whatever speed your LAN provides.
- You have more than one ArchLinux system running on one physical machine (e.g. Docker, QEMU, …). Installing cpcache on the host and changing the mirror on each client allows your clients to fetch cached packages almost instantaneously.
Most importantly, cpcache allows you to share bandwidth when multiple clients are downloading the same file at the same time. For instance, suppose a second client requests a file that has been downloaded to 30% by the first client. The second client will obtain at least 30% of the file from the cache. Once the cache has been exhausted, we have two clients that require data from the remote mirror. That does not mean that bandwidth is split between the two clients: Instead, we continue maintain only one connection to the remote mirror. Therefore, both clients continue to download the uncached part of the file with the full speed provided by your ISP.
Let's outline a few more differences by comparing cpcache with the different caching methods listed in the wiki:
- Read-only cache using a web server such as darkhttp: This is messy since it will return lots of 404s. With cpcache, uncached packages will be downloaded as if you were downloading them directly from the remote mirror, while also storing them in the cache to make them available for subsequent requests.
- Read-write caches such as pacserve or pacredir: pacserve and pacredir are distributed while cpcache is centralized. Distributed solutions have the advantage that they don't rely on a single machine being able to serve the requests from cache. However, if you have one machine that's either always running or almost always running when a second machine is also running, a centralized solution will most likely lead to more cache hits: a package has to be downloaded only once by any arbitrary client in order to be available for all other clients.
- Reverse proxy cache using NGINX: Apart from the fact that cpcache can utilize the full bandwidth even with multiple concurrent downloads, the setup described in the wiki is quite similar to cpcaches approach. However, cpcache provides additional features. For instance, it obtains the most recent list of official mirrors and attempts to choose a fast mirror for you. This means you will not have to maintain a mirror list yourself.
- Proxy cache using squid:
The squid approach involves changing your
http_proxy
variable, which means all HTTP GET requests are routed through that proxy. If the proxy is down (or just inaccessible, think of a Laptop that is sometimes used in your LAN and sometimes on remote locations), your HTTP GET requests will fail. You don't have this issue with cpcache because conceptually, cpcache is just another mirror that you add in your mirrorlist: if it's not available, pacman will try the next mirror.
A package for Arch Linux is available on AUR. Install cpcache on your server. Then, start and enable the cpcache service:
systemctl start cpcache.service
systemctl enable cpcache.service
Set the new mirror in /etc/pacman.d/mirrorlist
on all clients. For instance, if the server running
cpcache can be reached via myhost.local
, add the following to the beginning of the mirrorlist file:
Server = http://myhost.local:7070/$repo/os/$arch
cpcache expects a configuration file in /etc/cpcache/cpcache.toml
. You can copy the example
configuration file from conf/cpcache.toml
to /etc/cpcache/cpcache.toml
and adapt it as required.
Pacman supports custom package repositories where both the package files and the database files reside on the local filesystem.
This functionality is used by tools such as aurto which allow you to maintain a
repository of packages built from AUR.
cpcache can be made aware of such local repositories with the localrepos
variable in its toml config: it will then serve all requests to this repository from the local filesystem. So if you want to make a local repository available on your LAN, you may find this setting more convenient than setting up yet another HTTP Server.
We'll describe how to use the localrepos
setting to make packages built with aurto available on your LAN (although this setting
can be used for other local repositories as well). We assume that aurto is installed on the same device that's also
running cpcache:
-
Edit your
/etc/cpcache/cpcache.toml
to include the name of the local repository. If the file already contains a variable namedlocalrepos
, change it as desired. If no variable namedlocalrepos
is included, add the following to the top of the file:localrepos = ["aurto"]
-
cpcache expects the files of the localrepo in
/var/cache/cpcache/pkg/aurto
, so let's just symlink the default directory of aurto to this directory:sudo ln -s /var/cache/pacman/aurto/ /var/cache/cpcache/pkg/
-
Every client needs to be acquainted with the new repository. aurto is doing this by default by adapting your
pacman.conf
and creating a/etc/pacman.d/aurto
, so no changes are required on the device where aurto is installed. On each client, append the following topacman.conf
:Include = /etc/pacman.d/aurto
and create the file
/etc/pacman.d/aurto
: it should point to the same host as defined in your mirrorlist, but without the trailing/os/$arch
:[aurto] SigLevel = Optional TrustAll Server = http://myhost.local:7070/$repo
Verify your settings by running pacman -Syu
: Pacman should successfully synchronize all package databases, including aurto.
You can now use aurto to build packages on the cpcache server, and then download them on all clients without having to build
them again.
In case you want to use NGINX as reverse proxy, keep in mind that it uses caching by default, which
will cause timeouts in pacman since downloads then require a few seconds to start. Use
proxy_buffering off;
to prevent this.
Here's an example config that can be used for NGINX:
server {
server_name archlinux.myhost.org;
listen [::]:80;
listen 80;
location ~ ^/(core|extra|community|multilib)/ {
proxy_pass http://127.0.0.1:7070;
proxy_buffering off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
In case you want to avoid storing packages redundantly (i.e., both on the client and on the server that runs cpcache
),
you can set the CacheDir
in your pacman.conf to a subdirectory of /tmp
.
/tmp
is set to a tmpfs by default, meaning the package cache
will be cleared when shutting down.
Simply edit your /etc/pacman.conf
and adapt the CacheDir
setting. For instance:
CacheDir = /tmp/pacman_cache
Notice that the directory /tmp/pacman_cache
does not exist yet, and even if you were to create it, it would not
survive a reboot. Pacman will therefore emit a warning:
warning: no /tmp/pacman_cache/ cache exists, creating...
and create the directory for you. You can safely ignore this warning. Alternatively, if you prefer not to have pacman emit
this warning, you might consider adapting your /etc/fstab
to create a second tmpfs on /var/cache/pacman/pkg
.
paccache
from pacman-contrib can be used to purge old packages. Install it if you haven't done so already:
sudo pacman -S pacman-contrib
Packages are stored in the directory specified by the cache_directory
variable in /etc/cpcache/cpcache.toml
. By default, it's /var/cache/cpcache
. Use paccache
to clean up the subdirectories of this directory. For instance,
the following will delete all packages except for the three most recent versions:
for cache_dir in /var/cache/cpcache/pkg/*/os/x86_64/; do
paccache -r -k3 -c $cache_dir
done
Using the PKGBUILD from AUR is probably the easiest way to get cpcache up and running. But If you want to build cpcache on your own machine, you can do so by either using Docker, or by installing Elixir and running cpcache with Elixir's build tool, mix.
Build the image and start the container with:
docker-compose up
Notice that all downloaded files will then be stored inside the container, so if you're using Docker for more than just testing purposes, consider running the docker container with bind-mounts or volumes.
Install the following requirements first:
# pacman -S git elixir sudo
Set up the cpcache user with all required directories and permissions:
# useradd -r -s /bin/bash -m -d /var/lib/cpcache cpcache
# mkdir -p /var/cache/cpcache/pkg/{community,community-staging,community-testing,core,extra,gnome-unstable,kde-unstable,multilib,multilib-testing,staging,testing}/os/x86_64
# mkdir -p /var/cache/cpcache/state
# mkdir /etc/cpcache
# chown -R cpcache:cpcache "/var/cache/cpcache"
Clone the repository and fetch all dependencies:
# sudo -u cpcache -i
$ git clone https://github.com/nroi/cpcache
$ mix local.hex --force
$ mix local.rebar --force
$ cd cpcache
$ mix deps.get
cpcache
requires a config file in /etc/cpcache
:
# cp /var/lib/cpcache/cpcache/conf/cpcache.toml /etc/cpcache/
Finally, you can run cpcache
as its own user (i.e., run sudo -u cpcache -i
before running this command):
$ iex -S mix
cpcache runs on all platforms supported by Erlang, which includes x86_64 and most ARM platforms. This also means that cpcache does run on a Raspberry Pi. Only x86_64 clients are supported by cpcache, which means that while cpcache can be installed on an ARM device, it cannot serve files to clients which run anything other than the official Arch Linux distribution.
After each start, cpcache fetches an up-to-date mirrorlist from https://www.archlinux.org/mirrors/status/json/.
It then filters the mirrors according to the criteria given in the /etc/cpcache/cpcache.toml
configuration file.
This should exclude mirrors that are outdated or very unreliable. For all mirrors that
match those criteria, it runs a few latency tests and selects a mirror with good latency. The process is described
in greater detail in the configuration file.
If cpcache happens to select a mirror that turns out to be slow or unreliable, you might want to checkout the
mirrors_predefined
and mirrors_blacklist
option in the configuration file. But this is just a workaround, if cpcache
selects crappy mirrors, I consider this a bug and will appreciate it if you open an issue.