-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add remaining spec text #39
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good overall. The biggest question I have is whether we need to specify caching behaviour or if it can be left as implementation detail.
This is a big review, I"ll do another pass tomorrow.
<td>Object</td> | ||
<td> | ||
Cache for local variables, which may be resolved during the formatting. | ||
The values in the cache are Objects, each of which will have either [[MessageValue]], [[ResolvedValue]], or [[UnresolvedDeclaration]] internal slots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this cache to be exposed to the user? This seems like this is something that could be left to implementations rather than specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I"ve included this because its presence or absence has an effect on the observable error handling. Because we don"t just throw on the first error, we need to account for what happens when formatting a message like:
.local $foo = {:missing-function}
{{First {$foo}, then {$foo} again}}
As currently specified, formatting that without a missing-function
will emit one error for the .local
declaration. If the cache is left out, then an implementation may emit the same error twice, as there are two {$foo}
placeholders.
My presumption is that it"s better to avoid such ambiguity, even if the cost of that is a slight increase in spec complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, it"s not actually user observable, right? Just more explicit in the spec but that"s not a big problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache is not directly user observable.
</h1> | ||
<dl class="header"> | ||
<dt>description</dt> | ||
<dd>It determines the functions available during message formatting.</dd> | ||
</dl> | ||
|
||
<emu-alg> | ||
1. ...TODO | ||
1. Let _numberSteps_ be the algorithm steps defined in <emu-xref href="#sec-messageformat-numberfunctions"></emu-xref>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be more readable if you added a CreateMessageFormatNumberFunction
abstract operation, instead of having the _numberSteps_
followed by CreateBuiltinFunction
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, I actually had them that way in an earlier draft. Happy to do that, esp. if @ryzokuken agrees.
The current function steps definition language is modelled after that used for e.g. Promise Resolve Functions and Number Format Functions. The language on this line follows what seemed like the most common way of calling CreateBuiltinFunction()
, though Intl.NumberFormat.prototype.format does something a bit different:
Let F be a new built-in function object as defined in Number Format Functions (15.5.2).
My main reason for this approach was that this way the :number
and :string
definitions can be left without the boilerplate code that"s now here. But my opinions on this are not at all strong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has me equally stumped. On the one hand I agree with @dminor that an AO would be much more readable and less arcane than this.
On the other, I understand the connection to the other similar parts of the spec. I still lean towards AOs over dynamically creating ES builtin functions because my understanding of the MF2 builtin registry functions is that they"d not be actually dynamic and therefore we"ll be fine by using an AO here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I"ve added #42 to continue this.
1. ...TODO | ||
1. Let _numberSteps_ be the algorithm steps defined in <emu-xref href="#sec-messageformat-numberfunctions"></emu-xref>. | ||
1. Let _number_ be CreateBuiltinFunction(_numberSteps_, *3*, *"number"*, « »). | ||
1. Let _stringSteps_ be the algorithm steps defined in <emu-xref href="#sec-messageformat-stringfunctions"></emu-xref>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same idea here, I"d prefer a CreateMessageFormatStringFunction
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me :)
1. Let _type_ be ? Get(_value_, *"type"*). | ||
1. If _type_ is *"literal"*, then | ||
1. Let _strValue_ be ? Get(_value_, *"value"*). | ||
1. Modify _strValue_ such that all *"\"* characters are replaced with *"\\"*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe pedantic, but is it really possible to modify strValue? I thought we"d have to make a copy with the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually very tempted to leave this out and add an <emu-note type="editor">
about this being a necessary step, as I"m not certain how to better express this in spec language:
strValue = strValue.replace(/[\\|]/g, "\\$&");
Co-authored-by: Daniel Minor <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for a big PR, the spec text is complex but in a way that accurately represents the complexity of this proposal. Some comments inline but nothing major.
1. Let _stringValue_ be _mv_. | ||
1. Else, | ||
1. For each element _el_ of _msg_, do | ||
1. If Type(_el_) is String, then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further down below in the description of the types like Number
, String
and Object
, they contain definitions to explicit type checks like "is not a number". From what I can observe, the verbose Type(x)
syntax is always used when there"s some dynamic path involved, either checking if an arbitrary type is part of a collection or if the types of two things are the same.
TL;DR this can be simplified but I"m just nitpicking at this point.
1. If Type(_el_) is String, then | |
1. If _el_ is a String, then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, I changed all of these from the "x is a String" format to "Type(x) is String" based on @dminor"s review.
So rather than toggling them back, I"m going to stick with the current form for now, and once we figure out how they really ought to look like, we can do a separate editorial pass as a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I"ve added #41 to continue this.
<li> | ||
[[RequestedLocales]] is a List of String values | ||
with the canonicalized language tags of the requested locales | ||
to use for message formatting. | ||
</li> | ||
<li>[[Functions]] is an Object ...TODO</li> | ||
<li>[[Functions]] is an Object with function object values.</li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since [[Functions]]
is an internal slot, wouldn"t it be better to make it a Record containing function objects as values instead of a plain JS object?
<td>Object</td> | ||
<td> | ||
Cache for local variables, which may be resolved during the formatting. | ||
The values in the cache are Objects, each of which will have either [[MessageValue]], [[ResolvedValue]], or [[UnresolvedDeclaration]] internal slots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, it"s not actually user observable, right? Just more explicit in the spec but that"s not a big problem.
1. Else if an appropriate notification mechanism exists, then | ||
1. Issue a warning for _error_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to say if this is doable, but let"s see 🤞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, is there prior art for this? Maybe I"m just unaware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There"s some precedent:
- https://tc39.es/ecma262/#sec-directive-prologues-and-the-use-strict-directive
A Directive Prologue may contain more than one Use Strict Directive. However, an implementation may issue a warning if this occurs. [...] NOTE: [...] If an appropriate notification mechanism exists, an implementation should issue a warning if it encounters in a Directive Prologue an ExpressionStatement that is not a Use Strict Directive and which does not have a meaning defined by the implementation.
- https://tc39.es/ecma262/#sec-error-handling-and-language-extensions
An implementation shall not treat other kinds of errors as early errors even if the compiler can prove that a construct cannot execute without error under any circumstances. An implementation may issue an early warning in such a case, but it should not report the error until the relevant construct is actually executed.
- https://tc39.es/ecma262/#sec-block-level-function-declarations-web-legacy-compatibility-semantics
If an ECMAScript implementation has a mechanism for reporting diagnostic warning messages, a warning should be produced when code contains a FunctionDeclaration for which these compatibility semantics are applied and introduce observable differences from non-compatibility semantics. For example, if a var binding is not introduced because its introduction would create an early error, a warning message should not be produced.
1. If _kind_ is *"open"*, return the string-concatenation of *"#"* and _name_. | ||
1. If _kind_ is *"standalone"*, return the string-concatenation of *"#"*, _name_, and *"/"*. | ||
1. If _kind_ is *"close"*, return the string-concatenation of *"/"* and _name_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, wasn"t this + and - for open and close instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was, but the sigils were changed in unicode-org/message-format-wg#541.
This completes the spec draft for the proposal, so... it"s a lot. This PR also presupposed that #29 is accepted.
To make reading the text easier, I"ve made http://tc39.es/proposal-intl-messageformat/ currently serve content from this PR"s branch.
Rather than passing around multiple arguments, the mf, onError and values are now packaged together into a MessageFormatContext Record, along with a cache required for resolving declarations. That cache is required to only read input values once, and to not emit errors for the dependencies of unformatted patterns.
For example, with a message like:
If this is formatted with
{ x: 2 }
, we should not even check whether the:broken
function is available, as it"s never used for formatting "Otherwise".The cache gets a bit more complex as we need to allow for the right fallback behaviour when formatting functions fail. For example with:
If this is formatted with
{ foo: 42 }
and calling:broken
fails, the expected result is "Bare 42, annotated {$foo}", so we need to retain thefoo
name of the variable even after its resolution has succeeded.The
:number
and:string
implementations are included here. Figuring out how to define anonymous built-in options that themselves create abstract closures was... a learning experience. They need to be recreated separately for each MessageFormat instance as they are available viaMF.p.resolvedOptions().functions
, and assigning properties on them should not be reflected in other MessageFormat instances.The
:number
input handling also gets a little tricky, because we need to support e.g.where the literal values
1.234e6
,4
, andfalse
get passed in as strings.