Page MenuHomePhabricator

Provide a mechanism for scoping Variables
Open, MediumPublic25 Estimated Story PointsFeature

Description

Currently, variables defined by Ext:Arrays, Ext:Variables, and similar extensions have global scope - that is, if a variable is defined at the top of a page, it can be accessed at any other point on that page, even in template calls (and if you use Ext:Variables' #var_final, the variable doesn't even have to be defined before this usage, so a usage could be at the top of a page while the definition is at the bottom). Arguably, this is the functionality that makes these extensions so useful in the first place; in particular, it is the entire point of Ext:Variables.

However, it is usually not necessary to have these variables in the scope of the entire page, and often enough is even counterproductive: it is all too easy to define a variable in a template with a given name, and then to have another template check for a variable by that name, where the intention is that the second template is setting the variable itself. There are a couple of workarounds to this, of course: careful variable naming, especially by prefixing the template name, will all but eliminate accidental variable leakage between templates, an making a point to reset all variables used by a template at the end of that template, or to always set variables used by the template at the beginning of that template, also prevents leakage, but the former often leads to awkward names, and the latter introduces code bloat and duplication that would be better avoided.

I think the better solution here would be to introduce a scoping mechanism to these extensions, and require either explicitly listing variables which should be global instead of local, listing global variables which should be "imported" into the current local scope, or both. The actual mechanism could be one or more new parser functions, a new mode(s) on the current parser functions, or something else; I don't claim to have much in the way of good ideas here.

Event Timeline

However, it is usually not necessary to have these variables in the scope of the entire page, and often enough is even counterproductive: it is all too easy to define a variable in a template with a given name, and then to have another template check for a variable by that name, where the intention is that the second template is setting the variable itself. There are a couple of workarounds to this, of course: careful variable naming, especially by prefixing the template name, will all but eliminate accidental variable leakage between templates, an making a point to reset all variables used by a template at the end of that template, or to always set variables used by the template at the beginning of that template, also prevents leakage, but the former often leads to awkward names, and the latter introduces code bloat and duplication that would be better avoided.

You are probably right, as this is a feature often suggested on the extension talk page over the last years. However, consider the following scenario:

  • Page A transcludes template B
  • Template B defines #tempvar local ans transcludes Template C

Should Template C have access to #tempvar now? I thought it should at first, but I'm not sure anymore.

  • Pro access:
    • It would follow normal behavior of variable scope.
    • It's kind of more expected, esspecially if we name it #localvar.
  • Contra access:
    • We are kind of at the start again for "template childern": They get a variable they didn't define, but only if they are transcluded into B, and then, at some point in another template, the variable vanishes again… I don't like this.
    • If you want access, just pass it as a parameter explicitly. That makes it more transparent and way easier to comprehend.

Fesature design wise, I now clearly prefer the Contra access variant. (However, what's easier to implement, I don't know yet.)

I think the better solution here would be to introduce a scoping mechanism to these extensions, and require either explicitly listing variables which should be global instead of local, listing global variables which should be "imported" into the current local scope, or both. The actual mechanism could be one or more new parser functions, a new mode(s) on the current parser functions, or something else; I don't claim to have much in the way of good ideas here.

I don't think this is easy to implement at all.

  • Parser, Preprocessor and friends are probably the most complex thing MediaWiki has to offer. You can follow method calls forever, until documentation becomes so spare, that you don't really know if you are right or not. There are currently 13500 lines that are kind of relevant to understand what's going on there. (I think classes like Parser should be forbidden. There are 6000 lines, and 116 public functions. And then it plays ping pong with two different Preprocessor implementations… I have no real idea what's going on there, I try to understand small fragments.)
  • After some code reading, I'm quite sure there are no hooks to do what you want to do. They would have to be added and to be accepted in MediaWiki Core.
  • If what we do involves Preprocessor heavily, chance is we actually would have to hook in there. I don't think anyone wants hooks in there. But honestly, I have no real idea yet.
  • T204945: Deprecate one of the Preprocessor implementations for 1.34 If what we do involves Preprocessor heavily, we must do it two times working exactly the same. Or we wait some time until one of them is gone.

You are probably right, as this is a feature often suggested on the extension talk page over the last years. However, consider the following scenario:
[...]

I'm with you on prefering the Contra variant; as I have used Variables over the years, I have found more and more that relying on them to pass state between templates is a recipe for sometimes-hard-to-diagnose bugs. If you take a step back and think about it in a more abstract manner, this is basically a situation of globals, and thus in addition to all the weirdness deriving from the fact that it's wikitext, it also carries all the long-documented problems associated with usage of globals in programming. Therefore increasingly I prefer explicitly passing data to/between templates as template parameters.

To think of it a different way, you could argue that templates are objects, their parameters are the public interfaces, and their variables are internal state. When you think of it like that, it becomes very obvious that objects should not rely on each others' internal state, and should instead use the exposed public interfaces.

I don't think this is easy to implement at all.

Keep in mind that I never said anything about ease of implementation; I suspected it wouldn't be easy/simple to implement, but I also don't have the knowledge required to accurately assess that, so I kept quiet on that front. ;) Because of that, I am perfectly willing to trust your own assessment here.

You are probably right, as this is a feature often suggested on the extension talk page over the last years. However, consider the following scenario:
[...]

I'm with you on prefering the Contra variant; as I have used Variables over the years, I have found more and more that relying on them to pass state between templates is a recipe for sometimes-hard-to-diagnose bugs. If you take a step back and think about it in a more abstract manner, this is basically a situation of globals, and thus in addition to all the weirdness deriving from the fact that it's wikitext, it also carries all the long-documented problems associated with usage of globals in programming. Therefore increasingly I prefer explicitly passing data to/between templates as template parameters.

To think of it a different way, you could argue that templates are objects, their parameters are the public interfaces, and their variables are internal state. When you think of it like that, it becomes very obvious that objects should not rely on each others' internal state, and should instead use the exposed public interfaces.

I think you're correct there.

I don't think this is easy to implement at all.

Keep in mind that I never said anything about ease of implementation; I suspected it wouldn't be easy/simple to implement, but I also don't have the knowledge required to accurately assess that, so I kept quiet on that front. ;) Because of that, I am perfectly willing to trust your own assessment here.

I just wanted to point this out, as for this reason this task will probably be stale quite some time before implementing this functionality. I'm definitely planning to do it at some point though, at least for Variables.

I just wanted to point this out, as for this reason this task will probably be stale quite some time before implementing this functionality. I'm definitely planning to do it at some point though, at least for Variables.

You can think of this task as an Epic/Wishlist item if you want; that's more or less what I intended it as when I created it. :)

I just wanted to point this out, as for this reason this task will probably be stale quite some time before implementing this functionality. I'm definitely planning to do it at some point though, at least for Variables.

You can think of this task as an Epic/Wishlist item if you want; that's more or less what I intended it as when I created it. :)

Yeah. The funny thing is, due to the nature of this task, maybe I get the right idea this month, and maybe this is still open in a year…

MGChecker changed the subtype of this task from "Task" to "Feature Request".Oct 23 2018, 11:08 PM
MGChecker changed the subtype of this task from "Feature Request" to "Task".Oct 23 2018, 11:14 PM

It turns out, that the PPFrame passed to parser function implementations represents the template (not the parser function or something like that) we're currently in. So we might be able to save this data in the corresponding PPFrame as member variable. This could work, but it isn't really clean.

If this works, it's kind of easier as expected, but I'd really would prefer a clean way to do this. I'm not done yet removing all of the Technical Debt we collected over the years, I don't have to add new one when I'm done. However, if I can convince the right people that this is useful, I might be able to introduce a clean way to do this before MW 1.33 is released.

This would be a change large enough to bump the version number to 3.0.

MGChecker triaged this task as Medium priority.Oct 24 2018, 11:55 PM
MGChecker renamed this task from [Arrays] [Variables] [etc.] Provide a mechanism for scoping variables to Provide a mechanism for scoping Variables.Oct 25 2018, 12:03 AM
MGChecker claimed this task.

Splitting this to create a better seperation of actual tasks.

MGChecker set the point value for this task to 25.Oct 25 2018, 12:10 AM

Change 469683 had a related patch set uploaded (by MGChecker; owner: MGChecker):
[mediawiki/extensions/Variables@master] Proof of concept: Template scoped variables

https://gerrit.wikimedia.org/r/469683

I know how to do it and it works as expected. However, I won't use this implementation, but use an object to save this information, and hopefully use an "official approved" method to store the data as well.

However, as I want a clean architecture for this extension, the subtasks should be resolved first. They are kind of required to implement this cleanly.

MGChecker changed the subtype of this task from "Task" to "Feature Request".Mar 1 2019, 11:35 PM
MGChecker unsubscribed.
MGChecker subscribed.

Should these variables be only available at the same scope, or also on the child scopes? I would have a strong preference for the first option.

As I already noticed four and a half years ago (yikes) in T221457, introducing a possibility to define local constants would then be worthwhile as well. However, this would raise the question how lconstdefine would be handled on its second call. It could silently do nothing, print an error, etc. Do you have any opinions on this? In any case, getting some input from the Parsoid crew would be necessary to figure out which strategies could be promising and which not.

Should these variables be only available at the same scope, or also on the child scopes? I would have a strong preference for the first option.

I agree. If you need to pass state to a child scope (a template being called in your current scope), pass it explicitly as a parameter. If there are scenarios that would be affected by this that aren't a child template, I'm not aware of them and would need to see an example.

I don't know anything about lconstdefine and therefore don't have an opinion about how to handle second/additional calls to it.

Just to maybe give some ideas, what we do in our homebrew extension is to have all variables local to the frame (template) they're created in, but then we also have a #inherit function to pull in a variable from the parent or higher frame (climbing up until it finds the variable or runs out of frames). This is tremendously useful to act as a pseudo-setting, like setting "showweight" to 1 on the page, then repeatedly calling a template that can optionally show a weight column. It really adds up when there are dozens or even hundreds of calls where you don't have to add |showweight=1 every time. While much less used, we also have a #return function that sets variables on the frame one higher than where the #return was called from, assuming it's not already a top-level frame.

That all being said, this all relies on a similar frame-like structure that has access to its parent always being in use. So far, that seems like it'll be the case for both the legacy parser and Parsoid, so we're probably good on that front. The inherent problem with Parsoid, though, is the parallelism, as already noted on the Variables page on MW.

Also, in terms of how we have our variables reliably at frame scope, we're using the built-in variable system. Thus, {{#define:animal|dog}} {{{animal}}} produces "dog" if nothing was passed, but "cat" if the template was called with |animal=cat. The code is here, if you want to have a look at it: https://github.com/RobinHood70/MetaTemplate/tree/main/MetaTemplate

I am unsure if this is the right space to bring it up, but I was recently reminded by @Pppery that dynamic properties are deprecated in PHP 8.2 and who knows when they will be disallowed.

This is a problem for the current version of Variables, which attaches a dynamic property (mExtVariables) to the Parser.

The best alternative I can think of is to rely on some combination of Parser::setExtensionData() and Parser::appendExtensionData(), which is a little convoluted but may be our best option. I took the liberty to update the documentation over on the MediaWiki website: https://www.mediawiki.org/wiki/Manual:Extension_data#Advanced

Previously, we were allowed to use Parser::setExtensionData() but the use of that method to modify/destroy/reset values is deprecated (although MW 1.42 does not complain yet).

@RobinHood70 I haven't had time to look at your work but that looks interesting.