-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bidirectional isolation #30
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The locale can only be a hint.
In my view, the message as a whole should have a base paragraph direction. And each placeholder should have a base paragraph direction property.
When formatting to parts, one ends up with a sequence of parts that each have a queryable direction. When formatting to HTML this direction can be used to set the dir
attribute on markup, which is how bidi isolation is done in HTML.
When formatting to a string, this algorithm is thus mostly correct in its behavior, since it places isolating controls around placeholders.
Some will note that a placeholder can, itself, contains "parts" that have direction. For example, a date format produces a sequence of "parts" that are date/time fields (such as year or hour or such) or literals. However, I note that these "interior parts" are the responsibility of the formatting function. The important part of bidi isolation in MF2 is that the placeholder "sticks together" (is spanned with isolates).
My concern with this proposal is that it does not start from parts or with having a directionality property. It tries to infer direction from locale (which is not always appropriate and thus shouldn"t form the foundation of directionality support)
@aphillips Do I understand right that you think we should include My presumption was that from the locale code we could get the script, from which we could determine the direction. But that"s not always valid then? |
@eemeli More like: every formatted part should have a Input variables: { "item": {
"id": "978-111887164-5",
"title": {
"value": "HTML و CSS: تصميم و إنشاء مواقع الويب!",
"lang": "ar",
"dir": "rtl"
}
} And message:
In this case, you want the title bidi isolated (it"s in Arabic) and you don"t want a direction to be LTR (implied by We have a long document discussing this: Strings on the Web: Language and Direction Metadata. It includes a discussion of why language tags (and their extant or implied script subtag) is only a fallback. Thus |
After discussion with @aphillips elsewhere, I"ve rebased and updated this PR to include This also adds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these changes. Some comments.
By default the message's directionality is determined from | ||
the script corresponding to the first locale, | ||
but this may be overridden by `dir`. | ||
Its `"auto"` value corresponds to messages with unknown directionality, | ||
for which the direction is determined by the first strongly directional character. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would unpack this so that the hierarchy is clear.
By default the message"s directionality is determined from | |
the script corresponding to the first locale, | |
but this may be overridden by `dir`. | |
Its `"auto"` value corresponds to messages with unknown directionality, | |
for which the direction is determined by the first strongly directional character. | |
The base paragraph direction of a message is set by the `dir` property, if it is present. | |
If there is no `dir` property, the direction is determined by the result of `getTextInfo()` for the `locale` of the message. | |
If the direction cannot be determined, the value of `dir` is `"auto"` and directionality of the | |
message will be determined at runtime using the directionality of characters in the message | |
(known as "first-strong" detection). |
Custom user-defined message formatting and selection functions may defined by the `functions` option. | ||
These allow for any data types to be handled by custom functions. | ||
Such functions may be referenced within messages, | ||
and then called with the resolved values of their arguments and options. | ||
|
||
```ts | ||
interface MessageFormatOptions { | ||
bidiIsolation?: 'compatibility' | 'none'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compatibility
sounds like it exists for legacy reasons rather than wanting to be the default options. Perhaps give it a less redolent name, like unicode
or controls
? Would formatToParts have a different option (for when the caller wants to use markup)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "compatibility" name is explicitly mentioned in the MF2 spec: https://github.com/unicode-org/message-format-wg/blob/main/spec/formatting.md#handling-bidirectional-text
If it should be called something else, it would be best to re-name it there so that the same name will end up in other implementations as well.
The bidi isolation strategy is set in the constructor, so each needs to work for all formatting targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should figure out the naming there.
1. Let `parts` be the result of calling `mv.toParts()`. | ||
1. If the call fails or `parts` is not an array: | ||
1. Set `parts` to be `[{ type: 'fallback', locale: 'und', source: mv.source }]`. | ||
1. Set `parts` to be `[{ type: "fallback", source: mv.source }]`. | ||
1. Set `dir` to be `"auto"`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
1. Let `bidi` be the `{ start: string, end: string }` result of calling | ||
`ApplyBidiIsolation(bidiIsolation, msgDir, dir)`. | ||
1. If `bidi.start` is not an empty string: | ||
1. Append `{ type: 'bidiIsolation', value: bidi.start }` to `res`. | ||
1. For each `part` or `parts`: | ||
1. Append `part` to `res`. | ||
1. If `bidi.end` is not an empty string: | ||
1. Append `{ type: 'bidiIsolation', value: bidi.end }` to `res`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the discussion we were having on the parts spec in MF2. This design makes it hard for developers who are using the parts to build HTML using HTML"s dir
attribute, because instead of each part just having a direction, the developer getting a bidiIsolation
part has to peek ahead to figure out what the text is that it applies to. They then need to decide whether to generate separate bidi markup (such as a span
element) or try to apply dir
to a known inline element (which is why they got parts in the first place--to be able to decorate the string with em
, strong
, bdi
😉, or some CSS class.
This algorithm works for compatibility
mode, since it just has the bidi control in it. Is your intention that none
conveys what I just described above? If so, then the name might be wrong?
Perhaps make the names:
compatibility
=> controls
none
=> inline
or maybe attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bidiIsolation
parts are in addition to the dir
that"s on the formatted parts. So a developer building HTML could just ignore the bidiIsolation
parts and build e.g. <span dir>
elements as necessary from the dir
attributes.
On the other hand, in this particular case the bidiIsolation
parts also act as start/end markers in case the value formats to more than one top-level part (Note: this is different from e.g. number parts containing sub-parts), so that a single multi-part value could be completely encompassed in an appropriate <span dir>
if necessary.
Regarding the "compatibility" name, see my comment here. Regarding "none", I still think it"s right to say that using it no bidi isolation is provided. Given that the dir
values are still included in the parts output, it is of course possible for a consumer to construct its own bidi isolation from the provided information, but Intl.MessageFormat isn"t doing it in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that works for me, although I still want to think about the naming.
README.md
Outdated
locale: 'und'; | ||
dir: 'auto' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both locale and dir can be known from the call to MF. Any reason to suppress that information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For fallback, we can"t know without introspection what the source
direction is, and we should not presume that the contents match the locale. In other words, we do not know the locale or direction of a fallback part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, if source
is externally provided and isn"t the pattern string, I can agree about the direction. The locale, though, really is the locale of the formatter, I think. Otherwise the caller won"t know if some locale defaulting happened. If the source
is just an identifier, its actual language is probably "programmer-ese" and not any particular natural language. But the language of the words in the source don"t have to be in the locale"s language for that to be the correct locale for the message.
Co-authored-by: Addison Phillips <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good. My comments about naming remain, but they are minor and I could live with what"s here.
I"ve added an upstream issue about the "compatibility" name: unicode-org/message-format-wg#549 |
This fulfills the spec requirements for Handling Bidirectional Text.
The ApplyBidiIsolation AO will in the spec text need more formal descriptions of how the
"compatibility"
and"none"
strategies work, but I thought that would be a bit heavy for the README.We could also define something like a
"best fit"
option to match what"s done with locale negotiation, but I left that out as I wasn"t sure exactly what it would look like.CC @aphillips