CRUD operations on nix
package metadata.
Additional documentation may be found in the <pkgdb>/docs
directory.
This includes JSON input/output schemas used by commands such as pkgdb search
and pkgdb resolve
.
Links to additional documentation may be found at the bottom of this file.
Evaluating nix expressions for an entire flake is expensive but necessary for features like package search. This tool provides a way to scrape the data from a flake once and store it in a database for later usage.
The current responsibility of the pkgdb
tool extends only as far as scraping
a flake and generating a database.
The database should be queried using standard sqlite tools and libraries and all
decisions about how and when to generate and update the database are left up to
the consumer.
$ nix develop;
$ make -j;
See CONTRIBUTING.md for more information.
Build the database with the scrape
subcommand:
$ pkgdb scrape github:NixOS/nixpkgs;
fetching flake 'github:NixOS/nixpkgs'...
/Users/me/.cache/flox/pkgdb-v0/93a89abd052c90a33e8787a7740f2459cdb496980848011ae708b0de1bbfac82.sqlite
By default, packages will be scraped from packages.[system arch] and stored in
~/.cache
in a database named after the flake fingerprint.
These can be overridden as desired:
$ pkgdb scrape github:NixOS/nixpkgs --database flakedb.sqlite legacyPackages aarch64-darwin
If the database for a given flake already exists and is asked to re-process an
existing package set, it will be skipped. Use --force
to force
an update/regeneration.
Once generated, the database can be opened and queried using sqlite3
.
$ sqlite3 flakedb.sqlite '.mode json' 'SELECT name, version FROM Packages LIMIT 10';
[{"name":"AMB-plugins-0.8.1","version":"0.8.1"},
{"name":"ArchiSteamFarm-5.4.7.3","version":"5.4.7.3"},
{"name":"AusweisApp2-1.26.7","version":"1.26.7"},
{"name":"BeatSaberModManager-0.0.5","version":"0.0.5"},
{"name":"CHOWTapeModel-2.10.0","version":"2.10.0"},
{"name":"CertDump-unstable-2023-07-12","version":"unstable-2023-07-12"},
{"name":"ChowCentaur-1.4.0","version":"1.4.0"},
{"name":"ChowKick-1.1.1","version":"1.1.1"},
{"name":"ChowPhaser-1.1.1","version":"1.1.1"},
{"name":"CoinMP-1.8.4","version":"1.8.4"}]
This utility is expected to be run multiple times if a client wishes to "fully scrape all the things" in a flake. This utility is a plumbing command used by a client application, we aren't particularly concerned with the repetitive strain injury a user would suffer if they tried to scrape everything in a flake interactively; rather we aim to do less in a single run and avoid scraping info the caller might not need for their use case.
A given client application that did want to scrape a flake completely would run something along the lines of:
$ lockedRef="github:NixOS/nixpkgs/e8039863435c68eb4f780f3e9bf3972a7399c4b1";
$ dbPath=;
$ for subtree in packages legacyPackages; do
for system in x86_64-linux x86_64-darwin aarch64-darwin aarch64-linux; do
pkgdb scrape "$lockedRef" "$subtree" "$system";
done
done
$ sqlite3 "$dbPath" 'SELECT COUNT( * ) FROM Packages';
In the example above we the caller would passes in a locked ref, this was technically optional, but is strongly recommended. What's important is that invocations that intend to append to an existing database ABSOLUTELY SHOULD be using locked flake references. In the event that you want to use an unlocked reference on the first call, you can extract a locked flake reference from a database for later runs, but official recommendation is to lock flakes before looping.
If the caller really wants to they could pass an unlocked ref on the first
invocation, and yank the locked reference from the resulting database.
This is potentially useful for working with local flakes in the event that you
don't want to use a utility like nix flake prefetch
or parser-util
to lock
your references for you:
$ dbPath="$( pkgdb scrape "$PWD/my-flake" packages x86_64-linux; )";
$ lockedRef="$( sqlite3 "$dbPath" 'SELECT string FROM LockedFlake'; )";
$ pkgdb scrape "$lockedRef" packages aarch64-linux;
...<SNIP>...
The pkgdb get {db,done,flake,id,path}
subcommands expose a handful of special
queries for package databases that may be useful for simple scripts.
These don't have queries for package metadata, sqlite3
is recommended for
these types of queries.
Subcommands:
pkgdb get db
Get absolute path to Package DB for a flakepkgdb get done
Check to see if an attrset and its children has been scrapedpkgdb get flake
Get flake metadata from Package DBpkgdb get id
Lookup an attribute set or package rowid
pkgdb get path
Lookup an (AttrSets|Packages).id attribute path
List all known databases and their associated flake information.
Accepts the options --cachedir PATH
and --json
.
See pkgdb list --help
for more info.
The data is represented in a tree format matching the attrPath
structure.
The two entities are AttrSets
(branches) and Packages (leaves).
Packages and AttrSets
each have a parentId
, which is always found
in AttrSets
.
AttrSets.done
(boolean) indicates that an attribute set and all of its
children, have been fully scraped and do not need to be reprocessed.
Descriptions are de-duplicated (for instance between two packages for separate
architectures) by a Descriptions
table.
DbVersions
and LockedFlake
tables store metadata about the version of
pkgdb
that generated the database and the flake which was scraped.
If they are defined explicitly, pname
and version
will be read from the
corresponding attributes.
Otherwise, they will be parsed from the name
.
If version
can be converted to a semver, it will be.
Note that the attrName
for a package is the actual name in the tree.
If outputsToInstall
is not defined, it will be the set of outputs
up to and
including "out"
.
erDiagram
AttrSets ||--o{ Packages : "contains"
AttrSets ||--o{ AttrSets : "contains nested"
Packages ||--|| Descriptions : "described by"
Packages {
int id
int parentId
text attrName
text name
text pname
text version
text semver
text license
json outputs
json outputsToInstall
bool broken
bool unfree
int descriptionId
}
AttrSets {
int id
int parent
text attrName
bool done
}
DbVersions {
text name
text version
}
Descriptions {
int id
text description
}
LockedFlake {
text fingerprint
text string
json attrs
}
Each locked flake has its own database keyed using a unique fingerprint.
The separation between these databases simplifies change detection and handling
of overridden inputs to flakes.
These fingerprints are identical to those used by nix
to create its own
eval caches.
Some commands allow database paths to be explicitly set with --database
,
while those which act on multiple databases will place databases under
the environment variable PKGDB_CACHEDIR
if it is set, otherwise the directory
${XDG_CACHE_HOME:-$HOME/.cache}/flox/pkgdb-v<SCHEMA-MAJOR>
is used.
Because each unique locked flake has its own database, over time these databases will accumulate and require garbage collection.
At this time there is no automated garbage collection mechanism, but simply deleting you cache directory will suffice.
Several commands such as pkgdb search
and pkgdb manifest
take an option
--ga-registry
which changes the behavior of registry constructs to contain
only a single input which provides nixpkgs=github:NixOS/nixpkgs/release-23.05
.
When --ga-registry
is provided, it is an error for users to write env-base
or registry
fields.
In the future this flag will be removed allowing users to set custom registries with multiple inputs or multiple branches.
For the purposes of testing we have provided an environment variable
_PKGDB_GA_REGISTRY_REF_OR_REV
where you can provide an alternative git
ref
( tag or branch name ) or a long revision hash.
This is used in our test suite.