Page MenuHomePhabricator

Migrate Kartotherian to node-mapnik v4.2.1 and unfork
Open, Needs TriagePublic

Description

The node-mapnik fork in the kartotherian namespace, that's currently used as basis for all other dependencies and running in production is actually a 1on1 fork from node-mapnik v4.2.1. This version is supposed to already adapt some changes for libmapnik 3.1.x, but de-facto seems to be very fine with libmapnik 3.0.x that's used in production.

With the current setup it seems to be fine to get rid of that fork. That will unblock us from getting rid of some other forks or at least upgrade our forks to more recent production versions. Eventually reducing the step to an libmapnik 3.1 upgrade. The only thing that changes when we de-fork is the version number of the node-mapnik dependency. The kartotherian fork currently sets it to v3.7.3.

The recent versions of almost all dependencies in our stack are fine with either node-mapnik v3.x.x or v4.x.x. Abaculus is the only service that, in it's recent version, is set to require at least node-mapnik v4.2.1. So we're also fine there.

If you dive a bit deeper though there is one inconvenience left. mapnik-reference. It's an essential dependency of tilelive-tmsource It's a dependency in carto that's used by tilelive-tmstyle. mapnik-reference is officially not compatible with node-mapnik versions above v3.x.x due to supposedly missing reference updates for libmapnik v3.1. There's an issue to fix that, but no progress since some years. [1]

De-facto, the code path that executes the Renderer.render() is never reached it seems. It's set in a callback that is never called [2].

[1] https://github.com/mapnik/mapnik-reference/issues/143
[2] https://github.com/mojodna/tilelive-tmstyle/blob/master/index.js#L208

Event Timeline

Change 884222 had a related patch set uploaded (by WMDE-Fisch; author: WMDE-Fisch):

[mediawiki/services/kartotherian@master] Remove tilelive-tmstyle

https://gerrit.wikimedia.org/r/884222

Updated due to the removal of tilelive-tmstyle.

Change 884222 merged by jenkins-bot:

[mediawiki/services/kartotherian@master] Remove tilelive-tmsource

https://gerrit.wikimedia.org/r/884222

Change 880495 had a related patch set uploaded (by WMDE-Fisch; author: WMDE-Fisch):

[mediawiki/services/kartotherian@master] [DNM] Use upstream node-mapnik 4.2.1

https://gerrit.wikimedia.org/r/880495

I started working on top of the last patch and it looks like i could bring the service up with latest mapnik (not forked version). Still needs some testing but that's promising:
https://gerrit.wikimedia.org/r/c/mediawiki/services/kartotherian/ /1064398

Here is the kartodock setup for latest dependencies:
https://github.com/wikimedia/kartodock/pull/36

Next TODO is to check the dependencies resolved to double check if any of those introduces legacy mapnik or other forked npm dependencies.
At this point I could run the server and GET a tile without any errors

Yes, great. If I remember correctly, all the work in progress patches were just fine from smoke testing and could potentially be merged. We just had no time to further work on that and finalize it.

Update on the latest node-mapnik version:

  • The mapnik team just released the latest version: 4.6.1
  • They stopped uploading packages to npm but instead they upload binary packages to github package registry
  • In order to use their published we need to create github tokens because this github registry is not public
    • I don't know how feasible is this for a public project like ours
  • As a workaround we can try to build our own node-mapnik bindings
    • I have a testing docker setup but its far from production ready
  • The latest available libmapnik from debian doesn't work well with the node-mapnik build where manually building mapnik from source works.

In retrospective I think we have this custom @kartotherian/mapnik just to be able to publish our own builds

More problems:

  • The libraries that depend to mapnik are using the old 'mapnik' package where the latest version is using '@mapnik/mapnik'

There is a workaround to create an npm alias in our case:
mapnik -> @mapnik/mapnik

From my PoC setup this works:

With this setup I can load a small country from OSM and casually browse the map with tiles loading succesfully.

Also i did some research on the libmapnik package in debian and it looks like the reason node-mapnik is not building succesfully with libmapnik4.0 [1] instead of building from source is the lack of this [2] util that's not packaged on purpose (from what I understand on purpose because it makes cross platform builds difficult and they would like to encourage using pkg-config).

[1] https://packages.debian.org/trixie/libmapnik4.0
[2] https://github.com/mapnik/mapnik/tree/master/utils/mapnik-config

After getting in touch with the maintainers of the packages in debian-gis IRC here are some information:

From my PoC setup this works:

If possible I'd move to a multi-stage build image, so the final result will hopefully contain only nodejs deps copied over and not build-packages/etc.. as well (it reduces the attack surface). It is not a mandatory step, but worth to mention :)

@Jgiannelos the plan that you wrote seems good! Are you waiting on specific feedback or should we proceed further?

From my PoC setup this works:

If possible I'd move to a multi-stage build image, so the final result will hopefully contain only nodejs deps copied over and not build-packages/etc.. as well (it reduces the attack surface). It is not a mandatory step, but worth to mention :)

@Jgiannelos the plan that you wrote seems good! Are you waiting on specific feedback or should we proceed further?

Hey @elukey thanks for the review! Would you help me understand what are the next steps and who should own each one? We are close to finish essential work planning and I feel that I don't know enough about the scope to understand how much effort is left for each team in this task.

Thanks in advance.

@MSantos sure! High level, my understanding is:

  • Your team (plus any volunteer that wants to chime in) should be responsible to come up with a Docker image that works with a recent-enough version of nodejs, basically what @Jgiannelos IIUC did locally. Once we have it, we can add the blubber/CI config and get it published to the Docker registry.
  • SRE will then help to add all the k8s config to bootstrap kartotherian on Wikikube (as promised, I'll lead the work with your team's helping to sanity check code reviews). We'll need to figure out what to do with load balancing, since IIUC we already have something in place (backed by the maps bare metal nodes), but it shouldn't be hard. Once the service is up and running on k8s we could flip its usage/config where needed, pointing clients to the new endpoint (assuming that we can do it, I am still very ignorant about how maps in configured).

Medium to long term SRE will be responsible for managing/operating reliably the service on K8s, but your team will be responsible (with our help) for basic Docker image maintenance (like once in a while bumping the OS/nodejs dependencies etc..).

Does it make sense?

Hi @elukey thanks for the update. Regarding building the image, I can give it a try to use blubber but i suspect that its not gonna work because the build instructions are more complicated. Do we another way to define images and upload them to docker-registry?

Hi @elukey thanks for the update. Regarding building the image, I can give it a try to use blubber but i suspect that its not gonna work because the build instructions are more complicated. Do we another way to define images and upload them to docker-registry?

Hi! I can help with Blubber, if you find any difficulties post a WIP patch and I'll try to work on it. We can also involve Releng in case, sometimes there are tricks to use that not everybody is aware of :) Ideally, blubber should be the way to go for this service, but we could think about the production-images repository as well (in which we define our own Dockerfiles etc..). It is meant for core k8s images, not services, but we could use it as last resort if needed.

@Jgiannelos forgot to write that I can try to come up with a blubber config for P68775, not sure if other things needs to be taken in consideration or not (especially from the list you made in T327396#10133682).

For the blubber part, it should be as easy as:

  1. Create a bash file like the following in the mapnick's repo, we can call it prepare_mapnick_upstream_checkout.sh or similar.
#!/bin/bash

set -ex

git clone https://github.com/mapnik/mapnik.git /srv/mapnik
pushd /srv/mapnik
git checkout v4.0.2
git submodule update --init
python3 ./scons/scons.py configure
python3 ./scons/scons.py install -j8
  1. Then in the Blubber config:
variants:
  build:
    apt:
      packages:
[..]
    builders:
      - custom:
          command: ["python3", "-m", "nltk.downloader", "omw", "sentiwordnet", "stopwords", "wordnet", "omw-1.4"]
      - custom:
          command: [ ./prepare_mapnick_upstream_checkout.sh ]
          requirements:
            - from: local
              source: prepare_mapnick_upstream_checkout.sh
              destination: prepare_mapnick_upstream_checkout

The nodejs part is easy as well but it requires some thinking: we only have nodejs20 (no -devel variant) published on the Docker Registry, since we install npm from the base Debian distro and nodejs20 uses Bookworm (see https://packages.debian.org/bookworm/npm). We could use nodejs18 and nodejs18-devel in case (both Bookworm based), and migrate to nodejs20 when available (we currently have nodejs20 because we imported nodejs in a special apt component, no npm present though).

Lemme know if this works or not :)

Change #1084179 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[mediawiki/services/kartotherian@master] Blubber builds with upstream mapnik and bindings

https://gerrit.wikimedia.org/r/1084179

The patch above creates an image using:

  • Latest mapnik built as a step in blubber config
  • Node bindings build as part of blubber
  • Kartotherian uses latest mapnik

I managed to have locally a succesful image built to run against a local postgres/tegola setup with a small map and tiles were rendered as expected.
Still things TODO:

  • Eslint is failing after the npm upgrades
  • This causes tests to fail too

Change #1087229 had a related patch set uploaded (by Awight; author: Awight):

[mediawiki/services/kartotherian@master] [build] Safe lint fixes

https://gerrit.wikimedia.org/r/1087229

Change #1088554 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[mediawiki/services/kartotherian@master] Fix eslint/stylelint related issues after upgrade

https://gerrit.wikimedia.org/r/1088554

Change #1088555 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[mediawiki/services/kartotherian@master] Fix issues encountered when running unit tests

https://gerrit.wikimedia.org/r/1088555