Add remaining spec text #39

eemeli · 2024-01-12T14:24:28Z

This completes the spec draft for the proposal, so... it"s a lot. This PR also presupposed that #29 is accepted.

To make reading the text easier, I"ve made http://tc39.es/proposal-intl-messageformat/ currently serve content from this PR"s branch.

Rather than passing around multiple arguments, the mf, onError and values are now packaged together into a MessageFormatContext Record, along with a cache required for resolving declarations. That cache is required to only read input values once, and to not emit errors for the dependencies of unformatted patterns.

For example, with a message like:

.local $foo = {:broken}
.match {$x :number}
1 {{The one {$foo}}}
* {{Otherwise}}

If this is formatted with { x: 2 }, we should not even check whether the :broken function is available, as it"s never used for formatting "Otherwise".

The cache gets a bit more complex as we need to allow for the right fallback behaviour when formatting functions fail. For example with:

Bare {$foo}, annotated {$foo :broken}

If this is formatted with { foo: 42 } and calling :broken fails, the expected result is "Bare 42, annotated {$foo}", so we need to retain the foo name of the variable even after its resolution has succeeded.

The :number and :string implementations are included here. Figuring out how to define anonymous built-in options that themselves create abstract closures was... a learning experience. They need to be recreated separately for each MessageFormat instance as they are available via MF.p.resolvedOptions().functions, and assigning properties on them should not be reflected in other MessageFormat instances.

The :number input handling also gets a little tricky, because we need to support e.g.

{1.234e6 :number minimumSignificantDigits=4 useGrouping=false}

where the literal values 1.234e6, 4, and false get passed in as strings.

dminor

This is looking good overall. The biggest question I have is whether we need to specify caching behaviour or if it can be left as implementation detail.

This is a big review, I"ll do another pass tomorrow.

dminor · 2024-01-15T21:03:41Z

spec.emu

+              <td>Object</td>
+              <td>
+                Cache for local variables, which may be resolved during the formatting.
+                The values in the cache are Objects, each of which will have either [[MessageValue]], [[ResolvedValue]], or [[UnresolvedDeclaration]] internal slots.


Do we want this cache to be exposed to the user? This seems like this is something that could be left to implementations rather than specified.

I"ve included this because its presence or absence has an effect on the observable error handling. Because we don"t just throw on the first error, we need to account for what happens when formatting a message like:

.local $foo = {:missing-function} {{First {$foo}, then {$foo} again}}

As currently specified, formatting that without a missing-function will emit one error for the .local declaration. If the cache is left out, then an implementation may emit the same error twice, as there are two {$foo} placeholders.

My presumption is that it"s better to avoid such ambiguity, even if the cost of that is a slight increase in spec complexity.

Besides, it"s not actually user observable, right? Just more explicit in the spec but that"s not a big problem.

The cache is not directly user observable.

spec.emu

dminor · 2024-01-15T21:12:15Z

spec.emu

      </h1>
      <dl class="header">
        <dt>description</dt>
        <dd>It determines the functions available during message formatting.</dd>
      </dl>

      <emu-alg>
-        1. ...TODO
+        1. Let _numberSteps_ be the algorithm steps defined in <emu-xref href="#sec-messageformat-numberfunctions"></emu-xref>.


I think this would be more readable if you added a CreateMessageFormatNumberFunction abstract operation, instead of having the _numberSteps_ followed by CreateBuiltinFunction.

Heh, I actually had them that way in an earlier draft. Happy to do that, esp. if @ryzokuken agrees.

The current function steps definition language is modelled after that used for e.g. Promise Resolve Functions and Number Format Functions. The language on this line follows what seemed like the most common way of calling CreateBuiltinFunction(), though Intl.NumberFormat.prototype.format does something a bit different:

Let F be a new built-in function object as defined in Number Format Functions (15.5.2).

My main reason for this approach was that this way the :number and :string definitions can be left without the boilerplate code that"s now here. But my opinions on this are not at all strong.

This has me equally stumped. On the one hand I agree with @dminor that an AO would be much more readable and less arcane than this.

On the other, I understand the connection to the other similar parts of the spec. I still lean towards AOs over dynamically creating ES builtin functions because my understanding of the MF2 builtin registry functions is that they"d not be actually dynamic and therefore we"ll be fine by using an AO here.

I"ve added #42 to continue this.

dminor · 2024-01-15T21:12:46Z

spec.emu

-        1. ...TODO
+        1. Let _numberSteps_ be the algorithm steps defined in <emu-xref href="#sec-messageformat-numberfunctions"></emu-xref>.
+        1. Let _number_ be CreateBuiltinFunction(_numberSteps_, *3*, *"number"*, « »).
+        1. Let _stringSteps_ be the algorithm steps defined in <emu-xref href="#sec-messageformat-stringfunctions"></emu-xref>.


Same idea here, I"d prefer a CreateMessageFormatStringFunction.

dminor

This looks good to me :)

spec.emu

dminor · 2024-01-16T21:16:28Z

spec.emu

+        1. Let _type_ be ? Get(_value_, *"type"*).
+        1. If _type_ is *"literal"*, then
+           1. Let _strValue_ be ? Get(_value_, *"value"*).
+           1. Modify _strValue_ such that all *"\"* characters are replaced with *"\\"*.


Maybe pedantic, but is it really possible to modify strValue? I thought we"d have to make a copy with the changes.

I was actually very tempted to leave this out and add an <emu-note type="editor"> about this being a necessary step, as I"m not certain how to better express this in spec language:

strValue = strValue.replace(/[\\|]/g, "\\$&");

spec.emu

Co-authored-by: Daniel Minor <[email protected]>

ryzokuken

Thanks for a big PR, the spec text is complex but in a way that accurately represents the complexity of this proposal. Some comments inline but nothing major.

ryzokuken · 2024-01-18T12:55:27Z

spec.emu

-              1. Let _stringValue_ be _mv_.
-           1. Else,
+        1. For each element _el_ of _msg_, do
+           1. If Type(_el_) is String, then


Further down below in the description of the types like Number, String and Object, they contain definitions to explicit type checks like "is not a number". From what I can observe, the verbose Type(x) syntax is always used when there"s some dynamic path involved, either checking if an arbitrary type is part of a collection or if the types of two things are the same.

TL;DR this can be simplified but I"m just nitpicking at this point.

Suggested change

1. If Type(_el_) is String, then

1. If _el_ is a String, then

Heh, I changed all of these from the "x is a String" format to "Type(x) is String" based on @dminor"s review.

So rather than toggling them back, I"m going to stick with the current form for now, and once we figure out how they really ought to look like, we can do a separate editorial pass as a separate PR.

I"ve added #41 to continue this.

spec.emu

ryzokuken · 2024-01-18T13:02:31Z

spec.emu

      <li>
        [[RequestedLocales]] is a List of String values
        with the canonicalized language tags of the requested locales
        to use for message formatting.
      </li>
-      <li>[[Functions]] is an Object ...TODO</li>
+      <li>[[Functions]] is an Object with function object values.</li>


Since [[Functions]] is an internal slot, wouldn"t it be better to make it a Record containing function objects as values instead of a plain JS object?

ryzokuken · 2024-01-18T15:09:11Z

spec.emu

+              <td>Object</td>
+              <td>
+                Cache for local variables, which may be resolved during the formatting.
+                The values in the cache are Objects, each of which will have either [[MessageValue]], [[ResolvedValue]], or [[UnresolvedDeclaration]] internal slots.


Besides, it"s not actually user observable, right? Just more explicit in the spec but that"s not a big problem.

spec.emu

ryzokuken · 2024-01-18T15:37:41Z

spec.emu

+        1. Else if an appropriate notification mechanism exists, then
+           1. Issue a warning for _error_.


Hard to say if this is doable, but let"s see 🤞

Wait, is there prior art for this? Maybe I"m just unaware.

There"s some precedent:

https://tc39.es/ecma262/#sec-directive-prologues-and-the-use-strict-directive

A Directive Prologue may contain more than one Use Strict Directive. However, an implementation may issue a warning if this occurs. [...] NOTE: [...] If an appropriate notification mechanism exists, an implementation should issue a warning if it encounters in a Directive Prologue an ExpressionStatement that is not a Use Strict Directive and which does not have a meaning defined by the implementation.

https://tc39.es/ecma262/#sec-error-handling-and-language-extensions

An implementation shall not treat other kinds of errors as early errors even if the compiler can prove that a construct cannot execute without error under any circumstances. An implementation may issue an early warning in such a case, but it should not report the error until the relevant construct is actually executed.

https://tc39.es/ecma262/#sec-block-level-function-declarations-web-legacy-compatibility-semantics

If an ECMAScript implementation has a mechanism for reporting diagnostic warning messages, a warning should be produced when code contains a FunctionDeclaration for which these compatibility semantics are applied and introduce observable differences from non-compatibility semantics. For example, if a var binding is not introduced because its introduction would create an early error, a warning message should not be produced.

ryzokuken · 2024-01-18T15:41:59Z

spec.emu

+           1. If _kind_ is *"open"*, return the string-concatenation of *"#"* and _name_.
+           1. If _kind_ is *"standalone"*, return the string-concatenation of *"#"*, _name_, and *"/"*.
+           1. If _kind_ is *"close"*, return the string-concatenation of *"/"* and _name_.


Wait, wasn"t this + and - for open and close instead?

It was, but the sigils were changed in unicode-org/message-format-wg#541.

Add remaining spec text

479ad90

eemeli requested review from dminor and ryzokuken January 12, 2024 14:24

eemeli temporarily deployed to github-pages January 12, 2024 14:27 — with GitHub Pages Inactive

Add missing if statement in GetSource()

7ce691a

eemeli temporarily deployed to github-pages January 12, 2024 14:40 — with GitHub Pages Inactive

Require functions to return something like a MessageValue

8ccf0bb

eemeli temporarily deployed to github-pages January 12, 2024 14:51 — with GitHub Pages Inactive

eemeli mentioned this pull request Jan 12, 2024

Ready for Stage 2? #40

Closed

dminor reviewed Jan 15, 2024

View reviewed changes

dminor approved these changes Jan 16, 2024

View reviewed changes

eemeli commented Jan 16, 2024

View reviewed changes