Page MenuHomePhabricator

Design the Example Node API
Closed, ResolvedPublic8 Estimated Story Points

Description

Context

We're creating an example node API that others can use to reference. We want to document our process through doing so and make the appropriate resources and instructions available.

We want the design of this API to have at least the following

  • Example Node API should return 200 with "Hello World" via endpoint
  • Example Node API should return 4XX error
  • Example Node API should enforce authentication with the application making requests
Acceptance Criteria
  • List out the step by steps for designing an API on phabricator and then move to the appropriate wiki based on findings from https://phabricator.wikimedia.org/T288126
  • Ensure swagger docs are published and accessible on the internet
  • Write API description
  • Share and get approval feedback from the team on this phabricator task

Event Timeline

sdkim renamed this task from Design the API to Design the Example Node API.Aug 10 2021, 5:04 PM
sdkim updated the task description. (Show Details)
sdkim set the point value for this task to 8.Aug 10 2021, 5:08 PM
sdkim moved this task from Incoming to Must do now on the API Platform board.

We could have the Example API do a little more than just return "Hello World", in order to showcase additional aspects of our infrastructure. For example, it could say "Hello <username>", have an image of a person waving, and tell me how many contributions I've made to wikipedia.

It would retrieve "username" by incorporating authentication, retrieve the image of the person waving by doing a MediaSearch query via the Action API, and retrieve the contribution count via the Core Rest API. For bonus points, the string "Hello" could be retrieved from external storage.

Continuing on the above thought, it probably makes sense to use an iterative approach. So for example, we might implement the Example API in the following stages:

  1. Example API returns hard-coded Hello World string
  2. Example API adds image from MediaSearch
  3. Example API incorporates authentication, allowing it to be exposed by the API Gateway with favorable rate limiting
  4. <other authentication steps>
  5. Example API incorporates storage, and retrieves part of the response from storage

Other orderings are possible, of course.

Some misc notes and questions:

  • the Example API can interact with authentication in several ways, including
    • act as a Resource Provider to an external caller (ultimately using metawiki as an Authorization Provider). So people would make calls to the Example API that might behave differently depending on their authentication status.
    • act as a consumer, using an instance of MediaWiki to act as the Resource Provider. So the Example API would make calls to some wiki that might behave differently depending on the Example API's authentication status.
  • as we're making an API and not an app, the "have an image" idea in the context of an API really means "return an image path as part of the response". Would/could/should we have a page somewhere that calls the Example API as an example of usage for consumers? This could also give consumers an example of how authentication works from their perspective.
  • some level of internationalization/translation would be good to incorporate. How could we fit this in?
NOTE: I'm sure I'm forgetting things in the list below. Please point out things I've neglected to include!

Per the task description, the first thing to do on this task is " List out the step by steps for designing an API on phabricator". I'm interpreting that to mean "what do we need to decide on before we can get started implementing an API?". If something else was intended, please advise. Based on my interpretation, here's what comes to mind:

  1. State the purpose of the API. In other words, what benefit are we trying to provide? Stated more bluntly: why are we bothering to make an API when there are plenty of other things we could do with our time?
  2. Identify potential users of the API, and any notable/relevant characteristics they may have. For example, are they already familiar with a particular sort of API? What is their level of technical sophistication? Are they a homogenous group (ex. Android app developers) or are they a wider mix? These questions might help identify what level of abstraction to use for various wiki concepts, whether the API should be "chatty" or "chunky", etc.
  3. Identify technical capabilities and preferences of the implementing team. Do they have extensive with a certain programming language, type of storage, etc.? What is the familiarity with MediaWiki Core, the WMF infrastructure, etc.
  4. Identify, as much as is possible at this stage, the technical requirements of the API. For example, does it need to access data for a particular wiki that is stored in existing tables? Does it consolidate data that is available from existing APIs? Does it require its own data storage? If so, is the data relational or document-oriented in nature? What kind of caching might the API benefit from? What authz might be necessary?
  5. Consider what technical artifacts you can build on, such as frameworks, service templates, etc.
  6. Combine answers to all the above questions into an architecture for the API. At this point, you should have made decisions such as:
    1. is this a MediaWiki extension, an external service, or something else?
    2. what programming language are you using
    3. what libraries, templates, etc. you will be using
    4. what technical components will your API need to interact with (and what WMF teams might you need to work with to secure access to them)
    5. what sort of reviews from people outside your team might your API need (security, performance, etc.)

At this point, you're finally ready to define actual endpoints.

Caveat One: of course, you may have explored possibilities for endpoints as part of the above, or even earlier. You may even have done an endpoint-first design process where your first step was a detailed exploration of your endpoints. But without knowing the full picture of your implementation, it was at least possible some of your decisions would change. With everything decided, you can have confidence that you're ready.

Caveat Two: it is possible to read the above as very waterfall-ish. There's absolutely nothing wrong with iterative design, and in many circumstances it is the right choice. But no matter how iterative you want to be, or how flexible you consider your decisions, you still have to make them. As just one example, even if you allow for the possibility that you'll change programming languages mid-project, you can't start coding the first iteration until you know what language you're coding it in.

Answers to the above questions for the Example API:

Purpose of the API: provide an example of best practices for producers and consumers to create and use WMF APIs

Potential users, notable characteristics: WMF staff, contractors, volunteers, community members, with a wide range of experience and background.

Capabilities of implementing team: very capable. Well, mostly Nikki and Wendy, when you average me in we all pass. ;-) The team as a whole possesses high levels of familiarity with WMF infrastructure and has recently completely implementation of an API service in nodejs

Technical requirements: see above comments regarding functionality. As this is an Example API, we want to touch quite a lot of the infrastructure just to show how it is done.

Technical artifacts: ServiceTemplateNode seems like a natural choice

Ways that we handle translation in a few notable WMF APIs:

Wikimedia REST API (aka RESTBase API): Accept-Language header
https://en.wikipedia.org/api/rest_v1/#/Page content/get_page_html__title_

Feeds API: path parameter (for mapping to a specific wikipedia)
https://api.wikimedia.org/wiki/API_reference/Feed/Featured_content

Action API: query parameter
https://www.mediawiki.org/wiki/API:Main_page

Regarding storage, using storage in a non-production situation (for example, Toolforge or Cloud VPS) is, as of recently, a low barrier to entry (https://wikitech.wikimedia.org/wiki/Help:Adding_a_Database_to_a_Cloud_VPS_Project). However, adding storage in a production situation is more involved and requires coordination and planning with other teams.

This would seem to argue for deferring using storage in the Example API until a later iteration.

However, there's another possibility we could pursue: configurable storage with a fallback to internal default. The idea is that the Example API would support a configuration option determining what storage to use. Or even better, an environment-aware configuration that specified what storage to use in various environments (dev vs. staging vs. prod). If the service were running in a situation where its configuration indicated that storage was not available, it would fall back to an internal default for the values it would have otherwise have obtained from storage.

So for example, the service running on Toolforge might have configuration pointing to a MariaDB instance hosted in Trove, and might pull the string "Hello" from a "strings" table. But when running on prod,, the service might see that no prod storage configuration exists and fall back to a hard-coded "Hello" string.

This would allow us to give an example of using storage even before we have prod storage available, and would also give us an opportunity to highlight how per-environment configuration should work.

Here are some standard HTTP headers related to language, with a brief description of each:

Accept-Language: describes which language(s) client is able to understand, and which locale variant is preferred. There can be multiple languages, each with an optional weight or 'quality' value. For example:

Accept-Language: da, en-GB;q=0.8, en;q=0.7

(The default weight is 1, so this is equivalent to da;q=1, en-GB;q=0.8, en;q=0.7).

Content-Language: describes the language that the content is intended for. , is used to describe the language(s) intended for the audience, so that it allows a user to differentiate according to the users' own preferred language. This can also contain multiple languages.

Vary: sent by the server to indicate which client headers the server considered when creating the response. This lets caches know that they need to create different entries for the same url, depending on the value of those headers.

Properly implementing all this may require client and/or server to parse the values and weights and respond appropriately. Here's a bit on Content Negotiation using headers: https://developer.mozilla.org/en-US/docs/Web/HTTP/Content_negotiation

So, should we use headers or the URL for language selection. The answer, as usual, seems to be "it depends".

From a REST perspective, there are some spirited discussions where everyone claims their interpretation is the "official" one. The best general rule that I've found so far is that there should be only one url for a resource. Headers can be used to differentiate between representations of the same resource, but they should not be used to differentiate between different resources.

As an example applicable to us, are the English and Spanish versions of an article different resources, or are they different representations of the same resource? If they're the same resource, then we should probably use one url to retrieve all of them and use headers to negotiate language. If they're different resources, we should probably use different urls for them.

For the wikipedia case, it seems pretty clear that these are different resources. They're on different websites which may have a radically different set of available resources. And even if two different languages wikipedias have pages for the same thing, the content of the page is likely to vary radically rather than just being a translation.

With this in mind, the Example API is one service that returns one resource, which can (if we choose to implement translations) be translated into multiple different languages. So it seems appropriate to use headers rather than a url parameter for the language.

In contrast to this, the Image Suggestions API is one service that supports multiple languages, but the suggestions in different languages are completely independent of each other. So it seems like we did the right thing by including the language in the URL there.

It is worth mentioning that the URL Structure as defined in the API Portal documentation includes the language in the URL:
https://api.wikimedia.org/wiki/API_reference#URL_structure

I'm unclear whether this is a hard requirement, or if it may not apply in some circumstances. It makes perfect sense that we'd include it for URLs that map to different wikipedias. After all, en.wikipedia.org and fr.wikipedia.org are different websites. It doesn't make as much sense to me that we'd need to include it if we exposed a service that either was language-agnostic, or which handled its own translations within the service (like the Example API).

@hnowlan, do you have any thoughts on the above, or on if language in the URL is a hard requirement from an API Gateway routing perspective?

@BPirkle When you say language-agnostic do you mean something that would aggregate multiple different wikipedias together? Or something like Commons?

For the translation portion, we could do an endpoint that has the desired route structure (with the language in it) and also add the ability to translate to a desired language as a query param, as well as including the Accept-Language header. When you suggested MediaSearch, were you thinking something like:

GET /en/wikipedia/example-image/bananas?lang=es where we query english wikipedia for some information on bananas and then also add some metadata from MediaSearch results (like top 5 image metadata or something) on bananas that is returned in spanish?

@BPirkle When you say language-agnostic do you mean something that would aggregate multiple different wikipedias together? Or something like Commons?

I was mostly thinking of purely data-driven things for which language didn't really apply. Commons might be a good example - suppose the service just returned image metadata. Not really any need for translation or multiple language support. Or imagine you were doing some purely mathematical service that - I dunno - validated cryptographic signatures or something. Language might not apply.

When you suggested MediaSearch, were you thinking something like:

GET /en/wikipedia/example-image/bananas?lang=es where we query english wikipedia for some information on bananas and then also add some metadata from MediaSearch results (like top 5 image metadata or something) on bananas that is returned in spanish?

My thought with MediaSearch was that it was just an example of how to query another API within our stack, not anything to do with translations. This example API is a "Hello World" app, right? So I was thinking to query MediaSearch for a person waving, or an image of the world, or something like that.

Or imagine you were doing some purely mathematical service`

Oh I gotcha, so something more like Mathoid in that case?

My thought with MediaSearch was that it was just an example of how to query another API within our stack, not anything to do with translations.

ok! so then the translation portion would be demo'ed on a different endpoint in the example api then?

Change 734443 had a related patch set uploaded (by BPirkle; author: BPirkle):

[mediawiki/services/example-node-api@master] Update the Example Node API documentation

https://gerrit.wikimedia.org/r/734443

I tried to do the "List out the step by steps for designing an API on phabricator" step of this task, and realized there were all sorts of levels of granularity and technical details that I could approach it from. I decided to take a first cut at listing these steps at a high level, with very little technical detail. These are therefore written in a way that might be helpful to both developers and product managers. If we'd prefer these to be at a different level of granularity, let me know and I'll rework them. Anyway, here's what I came up with, critique and criticism welcome:

  1. think about what you're trying to accomplish. An API is probably the right choice, but consider other possibilities. Some applications use shared files, shared storage, etc.
  1. Think about who will be using your API. Think about their technical preferences and proficiencies, and what other APIs they may already be accessing. Consider things like API type (REST/RPC/GraphQL/etc), how authentication is handled (OAuth/tokens/etc), how sophisticated the client may be in terms of things like caching headers, translations, etc. This may involve conversations with multiple people.
  1. Think about who will be implementing the API (just you? other developers on your team? someone else?). Think about their technical preferences and proficiencies, and what other APIs they may already be familiar with creating. Apply the same considerations as in #2. This may involve conversations with multiple people.
  1. Think about technical constraints surrounding the API and its anticipated usage. Does it have characteristics more suited to one technology or another? REST is extremely popular, and is often the right answer. But suppose your API will be used in a very bandwidth-sensitive situation where overfetching is a major concern. Maybe GraphQL would be better.
  1. combine #2, #3, and #4, and see how well they match. Hopefully they all lead to the same answer. But if not, step back and resolve any mismatches. This may involve even more conversations. At the end of this step, you haven't even defined any endpoints. But you have chosen appropriate technologies, and should have a sense of how both producers and consumers expect them to be used in the context of this API.
  1. consider how to model whatever is being accessed in whatever technology you decided on. For example, if you chose REST, define the resources your API will expose, which can be treated as collections, how you'll identify resources (id vs name), how you'll handle relationships between resources, etc.
  1. consider how your API will handle things that actively affect endpoint design. This includes things like filtering, sorting, and pagination
  1. consider how your API will handle things that don't actively affect endpoint design (or at least affect it tangentially). This includes things like rate limiting, authentication, and caching.
  1. actually define your endpoints, preferably in a standard format that can be easily shared with others (ex. OpenAPI).
  1. iterate and refine as necessary

Thanks Bill, I think this is very verbose and definitely helps push the thought process one should have when coming to this point. I'm wondering if we could publish and then get feedback from our API producers on what their thoughts are

Thanks Bill, I think this is very verbose and definitely helps push the thought process one should have when coming to this point. I'm wondering if we could publish and then get feedback from our API producers on what their thoughts are

Heh - "very verbose" is generally a pretty good description of my writing style. Usually takes me 3-4 drafts to trim out all the fluff. :)

I agree with publishing what we have and refining from there.

Change 734443 merged by jenkins-bot:

[mediawiki/services/example-node-api@master] Update the Example Node API documentation. Remove unnecessary Service Template Node code.

https://gerrit.wikimedia.org/r/734443

I think we're almost ready to close this task out. A few loose ends from the acceptance criteria:

  • where should we publish the "steps on designing an API" list? I'm thinking a new page on the API Portal, maybe https://api.wikimedia.org/wiki/Community/Designing_APIs (similar url to the API Guidelines). But I'm very open to suggestions.
  • is a codesearch link sufficient for ensuring the openapi spec is published and acceptable on the internet, at least for now? The example node api's entry in the API Catalog would be better, but we're not quite there yet
  • I'm still a little unclear on whether the openapi spec meets the "API Description" acceptance criteria. If not, there's also the README

where should we publish the "steps on designing an API" list? I'm thinking a new page on the API Portal, maybe https://api.wikimedia.org/wiki/Community/Designing_APIs (similar url to the API Guidelines). But I'm very open to suggestions.

My thought would be to append to https://api.wikimedia.org/wiki/Community/API_guidelines#Design_Phase. We also have https://api.wikimedia.org/wiki/Community/API_guidelines#Design_Principles which may feel like a lot of separate sections mentioning "design". Overall I think we could have a "API Design" section and pull everything together? open to thoughts

is a codesearch link sufficient

I think so!

My thought of API description was something similar to a swagger doc that outlines the endpoints and their respective parameters, etc. I think the[[ https://gerrit.wikimedia.org/g/mediawiki/services/example-node-api/ /aedb3976b345d58079db7b66d00f954338e0087b/spec.yaml | spec.yaml ]] satisfies this AC

I took a look, and couldn't figure out how to fit all that into the existing content without turning the page into gibberish. So for now, I posted it here: https://meta.wikimedia.org/wiki/User:BPirkle_(WMF)/Stuff/Designing_APIs and added a link to my new page from the page you mentioned.

That at least gets it on-wiki, while we figure out if it (a) has value in its current form and (b) where it should go.

Relevant thoughts from a Slack discussion regarding publishing the steps for designing an API:

@apaskulin : Looking at https://meta.wikimedia.org/wiki/User:BPirkle_(WMF)/Stuff/Designing_APIs, this seems more like a template than a doc. I’m happy to work on converting it to a template if that makes sense. I’d prefer a Phabricator task template, but a Google doc would also work. In terms of the design section of the guidelines, I like the phases as they are laid out, but it’s important that we aren’t sending readers off to several other docs

@sdkim : Thanks @apaskulin, so @BPirkle if you think the high-level design steps are captured in the API guidelines we could maybe take Alex’s advice in creating a phab task template for “Design API x”. Thoughts?

We could do something like Growth did here, by adding a button to the API Guidelines that links to the phab form:

https://www.mediawiki.org/w/index.php?title=Growth/Communities/Get_the_Growth_experiments_on_your_wiki&oldid=3850037#Communities_that_have_the_features_deployed

I'll put something together to run by Alex for feedback.

Here is something to try

This auto-tags "API Platform" and auto-subscribes Seve and myself. We can easily adjust that part as desired. And of course, whoever creates the task can adjust as they choose.

Passing URL parameters is one option, requesting creation of a dedicated form is another option.

Passing URL parameters is one option, requesting creation of a dedicated form is another option.

Thanks for jumping in!

Sorry if I missed it in the docs you linked, but are there guidelines about when passing urls params vs a dedicated form is preferred?

I assumed that if we just wanted a custom description but otherwise an existing type would work, then url params would be the better choice. However, that may be an incorrect assumption. I mostly just looked around for existing art and copied the closest thing I could find. :)