Page MenuHomePhabricator

JsonBodyValidator does not validate the parameter types
Closed, ResolvedPublic

Description

As a TODO comment notes:

// TODO: use a ParamValidator to check field value, etc!

Currently, JsonBodyValidator does not perform any type check, which means that handlers could break horribly if they don't do the validation themselves (which they shouldn't do!).

Solution outline:

  • add getParsedBody and setParsedBody to RequestInterface (T357025), following PSR-7.
  • add parseBodyData to Handler. It should throw for unsupported body types, and return null if the handler wants to process the body as a stream.
  • The router object should call parseBodyData on the handler, and pass the result to the request (T358557)
  • The router object should ensure that requests that should have a body do have one, and requests that shouldn't don't have a body.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

We (WB Product Platform Team) recently looked at JsonBodyValidator for the Wikibase REST API and ended up creating a subclass in Wikibase to at least do basic type validation for primitive types. It would of course make sense for something like this to be upstreamed, but we weren't sure about the exact response format (keys and message), and whether this limited version is useful there at all.

We saw the comment in the code regarding ParamValidator, but it doesn't look like it can easily be used there. ParamValidator and the TypeDef objects seem to assume their input values to be strings, which makes sense for path params and query params, but not for JSON.

If someone finds bits of this TypeValidatingJsonBodyValidator in Wikibase useful, feel free to copy those over, or poke someone from our team to create a patch for core! :)

Change 809322 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/core@master] Rest: ensure there are no extraneous params in JSON bodies

https://gerrit.wikimedia.org/r/809322

The patch above does something, although it's not enough for this task. This seemed easier than validating types, though, so I thought I'd fix it.

Change 809567 had a related patch set uploaded (by Ollie Shotton; author: Ollie Shotton):

[mediawiki/extensions/Wikibase@master] REST: Extraneous params not allowed in JSON bodies

https://gerrit.wikimedia.org/r/809567

One more thing to note: JsonBodyValidator also does not support/check PARAM_ISMULTI, which I assume is in scope for this task.

kostajh subscribed.

The patch above does something, although it's not enough for this task. This seemed easier than validating types, though, so I thought I'd fix it.

It looks like an improvement to me; I've 2'ed it but it's depending on a Wikibase patch which I 1'ed.

I was surprised that JsonBodyValidator is not actually validating; if we're not going to fix this issue anytime soon, we should probably audit all uses of getBodyValidator to make sure that they don't assume that the fields are validated in ways that are unsafe after calling $this->getValidatedBody();

Change 809322 merged by jenkins-bot:

[mediawiki/core@master] Rest: ensure there are no extraneous params in JSON bodies

https://gerrit.wikimedia.org/r/809322

I was surprised that JsonBodyValidator is not actually validating; if we're not going to fix this issue anytime soon, we should probably audit all uses of getBodyValidator to make sure that they don't assume that the fields are validated in ways that are unsafe after calling $this->getValidatedBody();

I think this should really be fixed, considering that REST APIs are being used in more and more places, and that JSON is pretty much the only format currently available. But then of course I'm not very familiar with the REST API framework, and I also don't know how this task fits into the API team's plans.

Krinkle added subscribers: BPirkle, Krinkle.

@BPirkle I'm boldly moving this back to the inbox for triage given that it was moved to the radar back in October before the current reorg. This seems outside the Platform team's scope, but let us know if you find something you need help with.

daniel added subscribers: tstarling, daniel.

The MW-Interfaces-Team is picking this up, but we are still exploring the best approach. This would be easy to fix in a hack way, but is unfortunately really hard to fix in a way follows best practices.

I had a conversation about this with @tstarling today, you can see a recording here: https://drive.google.com/file/d/1nTo7JPWKZpDWfe2hZIKpvy4ysL2Lt2Sv/view?usp=sharing

Quick summary:

  • The way BodyValidators are instantiated and passed around is awkward and could use improvement. However, the BodyValidator interface is marked @stable to extend, and JsonBodyValidator is instantiated directly be verious extensions. So it's not easy to change method signatures.
  • The information flow in ParamValidator and TypeDef seems backwards: The code that instantiates these needs to know about concrete parameter values. Code that needs to get validated parameter values needs to pass in the parameter definition array. It would be nice to flip that around, but that requires a pretty extensive refactoring (a first experiment touches more than 4000 lines or code)
  • The most promising approach seems to be to add a getParsedBody() method to RequestInterface. This way, ParamValidatorCallback can provide access to body fields exactly in the same way it provides access to path parameters and query parameters. If we follow this approach, we will probably not need JsonBodyValidator anyore. We may even end up deprecating the concept of BodyValidators in favor of a more direct approach to validation.
  • In the fuiture, we may want to use OpenAPI specifications to define REST entry points. Body validations would then be based on JSON Schema. However, some endpoints may want to avoid full recursive validation for performance reasons. To accommodate this, body validation may become something done explicitly in the handler's main execute method.
  • The way BodyValidators are instantiated and passed around is awkward and could use improvement. However, the BodyValidator interface is marked @stable to extend, and JsonBodyValidator is instantiated directly be verious extensions. So it's not easy to change method signatures.

JsonBodyValidator::__construct() takes a parameter definition array; the validateBody() call takes a request object. That's exactly what you describe in your next bullet point as ideal. What needs to be changed?

The way body validators are created is going to be a problem once you want to do other things than validating the request (e.g. generating an OpenAPI spec or an ApiSandbox-like user interface) since there is no way to instantiate a body validator without knowing the mime type. But that's not really relevant for this task.

  • The information flow in ParamValidator and TypeDef seems backwards: The code that instantiates these needs to know about concrete parameter values. Code that needs to get validated parameter values needs to pass in the parameter definition array. It would be nice to flip that around, but that requires a pretty extensive refactoring (a first experiment touches more than 4000 lines or code)

I think this reflects that in the action API there is always just one set of concrete parameter values (one request) but every module has its own set of parameter definitions, so only this way do you end up with validator objects that are reusable within the request. Also there are dynamically named parameters; so if you did it the other way around, every module would have to re-instantiate validator objects, and then the logic for handling templated parameters would have to re-instantiate the validators for each templated parameter again.

(This doesn't seem very relevant to the current task in any case.)

  • The most promising approach seems to be to add a getParsedBody() method to RequestInterface. This way, ParamValidatorCallback can provide access to body fields exactly in the same way it provides access to path parameters and query parameters. If we follow this approach, we will probably not need JsonBodyValidator anyore. We may even end up deprecating the concept of BodyValidators in favor of a more direct approach to validation.

I thought body validators reflect the idea that while there is more or less one reasonable way to validate query string / multipart/form-data / application/x-www-form-urlencoded parameters, the same isn't really true for other kinds of POST bodies, so the handler being able to instantiate the validator adds needed flexibility. I guess you could use something like getParamSettings() with ObjectFactory specs in it, but I am not sure it would be an improvement.

Also, it's worth separating three issues:

  • JsonBodyValidator takes a parameter definition array in its constructor, but doesn't actually use it. (I tried to fix that once, and gave up, but don't at all remember what was the difficulty.)
  • Probably most APIs will want to use nested JSON bodies, and ParamValidator cannot describe those. So we probably want a body validator that works with OpenAPI or JSONSchema definitions.
  • Rearchitecting validation for reasons that aren't directly related to improving validation functionality (but rather improving maintainability, DX etc).

I think trying the third for the action API would be very unwise. The REST API is a less established codebase, with no actual body validation support, so coordinating changes is probably easier there.

WIP...

JsonBodyValidator::__construct() takes a parameter definition array; the validateBody() call takes a request object. That's exactly what you describe in your next bullet point as ideal. What needs to be changed?

Right, the issues aren't really obvious... I see three, all related to injection:

  1. The only way for extensions to use JsonBodyValidator is to instantiate it directly. So the constructure signature needs to be stable. Every time we change it, extensions break. We'd need a BodyValidatorFactory to fix this - but that would mean the framework has to know the concrete validator classes. So endpoints can only configure validators, they can't subclass.
  2. We can't inject a ParamValidator into the validateBody() method either, or add an init() method for this purpose. The interface is fixed, and we would probably want to inject different things for different validators.
  3. If we can't inject a ParamValidator, we may want to create one directly. Apart from the issue of injecting the things needed to do that (like an ObjectFactory), the fact that ParamValidator's information flow is the inverse of the flow in BodyValidator (data in the constructor, definitions in the method) makes this awkward.

every module would have to re-instantiate validator objects, and then the logic for handling templated parameters would have to re-instantiate the validators for each templated parameter again.

Yes, they would be ParamDefs, not TypeDefs. I don't think re-usability across modules is a problem, though. An ActionAPI request typically only uses a couple of modules (between one and four, I think), and object instantiation is fast as long as we are not talking about hundreds of thousands.

I thought body validators reflect the idea that while there is more or less one reasonable way to validate query string / multipart/form-data / application/x-www-form-urlencoded parameters, the same isn't really true for other kinds of POST bodies, so the handler being able to instantiate the validator adds needed flexibility.

You are probably right that this was the idea, but it seems to me this idea is flawed: A an abstraction like BodyValidator is only useful if there are reusable implementations providing generic validation based on some kind of spec (param settings, JSON schema, OpenAPI, etc). Any validation that is more complex and specific to the logic of the handler can and hsould be done in the hander's execute mathod directly. There is no point in doing it "magically" up front, that just causes unnecessary compexity (which is what we are hitting right now).

I guess you could use something like getParamSettings() with ObjectFactory specs in it, but I am not sure it would be an improvement.

I don't see why we need an ObjectFactory. We can just directly instantiate TypeDefs (or ParamDefs) based on somthing JSON-Schema like (I am thinking of the kind of thing we ave in MainConfigSchema).

TypeDefs currently need a Callback object injected. I feel like it would be nice to refactor to change that. But perhaps all we need is a new ParamSpec class that can be instantiated directly, and can ontain a JSON schea or an object spec for a TypeDef.

Also, it's worth separating three issues:

  • JsonBodyValidator takes a parameter definition array in its constructor, but doesn't actually use it. (I tried to fix that once, and gave up, but don't at all remember what was the difficulty.)

Yes, and we should keep the scope of this task limited to that.

  • Probably most APIs will want to use nested JSON bodies, and ParamValidator cannot describe those. So we probably want a body validator that works with OpenAPI or JSONSchema definitions.

Yes, and we should keep that in mind when finding a fix for this ticket. But we don't need to solve it right away. I am leaning towards fulling going for OpenAPI for defining our APIs, though. One OpenAPI spec per extension.

  • Rearchitecting validation for reasons that aren't directly related to improving validation functionality (but rather improving maintainability, DX etc).

I think trying the third for the action API would be very unwise. The REST API is a less established codebase, with no actual body validation support, so coordinating changes is probably easier there.

I think the flow through ParamValidator and TypeDef could be improved, but I agree that it is risky, and quite a big refactoring. I will have another look at it, but you are probably right that we may want to avoid it, at least for now.

Any validation that is more complex and specific to the logic of the handler can and should be done in the handler's execute method directly.

Not if you want to reuse it to generate documentation/specification, a somewhat overlapping concern.

Any validation that is more complex and specific to the logic of the handler can and should be done in the handler's execute method directly.

Not if you want to reuse it to generate documentation/specification, a somewhat overlapping concern.

I think a generic mechanism for generating documentation can work up to the same point a generic mechanism for validation can work. In practice, to the level of detailed and expressiveness of an OpenAPI spec.

Any validation that would be more complex than that (e.g. "this value must be a talk namespace" or "the user referenced here must be in the admin group") will have to be described in natural language. That description can be part of the "generic" documentation.

The current JsonBodyValidator checks for extraneous parameters in the body. It would be good to not lose that ability.

I otherwise have no objection to removing the current body validation system and moving the ability to specify body fields into getParamSettings().

In a follow-up discussion, Gergo, Bill, Aaron and I talked about whether we shold keep the BodyValidator infrastructure at all:
https://drive.google.com/file/d/1rwM9TqjliJcJwj5XIg22gD8mfqVdHcdH/view

daniel triaged this task as Medium priority.
FJoseph-WMF changed the task status from Open to In Progress.Apr 18 2024, 3:28 PM

Change #1028444 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] REST: improve body validation testing

https://gerrit.wikimedia.org/r/1028444

daniel raised the priority of this task from Medium to High.May 16 2024, 3:08 PM

Change #1028444 merged by jenkins-bot:

[mediawiki/core@master] REST: improve body validation testing

https://gerrit.wikimedia.org/r/1028444

Change #1050016 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] REST: enforce native type of fields in json request

https://gerrit.wikimedia.org/r/1050016

Change #1050449 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] REST: Enable strict type validation in JSON request bodies

https://gerrit.wikimedia.org/r/1050449

Change #1050016 merged by jenkins-bot:

[mediawiki/core@master] REST: detect mismatching value types in json request

https://gerrit.wikimedia.org/r/1050016

One related bit of surprising behavior I've just encountered is that the API now responds with failureCode: 'missingparam' if a required string field is defined but empty (""). Not sure if this was intentional, but I'm pretty sure empty strings weren't rejected by the legacy JsonBodyValidator mechanism.

Change #1051076 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@wmf/1.43.0-wmf.11] REST: detect mismatching value types in json request

https://gerrit.wikimedia.org/r/1051076

One related bit of surprising behavior I've just encountered is that the API now responds with failureCode: 'missingparam' if a required string field is defined but empty (""). Not sure if this was intentional, but I'm pretty sure empty strings weren't rejected by the legacy JsonBodyValidator mechanism.

Cood catch, confirmed! I'll look into it.

Change #1051108 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] REST: accept empty strings in request body

https://gerrit.wikimedia.org/r/1051108

Change #1051076 merged by jenkins-bot:

[mediawiki/core@wmf/1.43.0-wmf.11] REST: detect mismatching value types in json request

https://gerrit.wikimedia.org/r/1051076

Mentioned in SAL (#wikimedia-operations) [2024-07-01T11:39:12Z] <daniel@deploy1002> Started scap: Backport for [[gerrit:1051076|REST: detect mismatching value types in json request (T305973)]]

Mentioned in SAL (#wikimedia-operations) [2024-07-01T12:00:37Z] <daniel@deploy1002> daniel: Backport for [[gerrit:1051076|REST: detect mismatching value types in json request (T305973)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-07-01T12:12:00Z] <daniel@deploy1002> Finished scap: Backport for [[gerrit:1051076|REST: detect mismatching value types in json request (T305973)]] (duration: 32m 48s)

Change #1051128 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/extensions/FlaggedRevs@master] ReviewHandler: declare all fields to be strings.

https://gerrit.wikimedia.org/r/1051128

Change #1051128 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@master] ReviewHandler: declare all fields to be strings.

https://gerrit.wikimedia.org/r/1051128

Change #1051108 merged by jenkins-bot:

[mediawiki/core@master] REST: accept empty strings in request body

https://gerrit.wikimedia.org/r/1051108

Change #1050449 merged by jenkins-bot:

[mediawiki/core@master] REST: Enable strict type validation in JSON request bodies

https://gerrit.wikimedia.org/r/1050449