rework format strings (closes #2244 #5803) #6475

nadako · 2017-07-29T23:39:10Z

This is a complete rework of interpolated strings handling, fixing issues like #2244 #5803 and providing a macro API to work with interpolated strings (through the new EFormat AST node kind).

It works by creating a mini-AST for an interpolated string on the lexer level which is then easily parsed to a EFormat(parts:Array<FormatSegment>) node.

For example a 'a$b${c 2}' string is emitted as a Fmt [ raw(a), name(b), code(c, , 2) ] token which is parsed into EFormat [ FRaw(a), FIdent(b), FExpr(EBinop(Add, Const(c), Const(2)) ] which is then expanded into "a" b (c 2) expression on typing.

Note that single-quoted strings without interpolations (so no $a or ${a}) are emitted as normal string tokens (and thus EConst(CString) AST nodes).

Other than fixing issues with reentrancy and macro reification (see tests included), I believe this opens new possibilities for macro-processing strings, like the infamous JSX. I think it's now much easier to parse things like jsx('<div>${names.map(name -> jsx('<p>$name</p>'))}</div>')

This is obviously a (somewhat) breaking change for macros that process strings, but it's Haxe 4 and it's for the greater good.

nadako · 2017-07-29T23:46:31Z

PS The cpp changes were needed because hxcpp uses the same template format as haxe interpolation for its xml directives, but Haxe is not supposed to process those, so I made the double-quoted strings required for @:buildXml and other c meta.

nadako · 2017-07-30T09:24:01Z

std/haxe/macro/ExprTools.hx

@@ -128,6 128,11 @@ class ExprTools {
 }
 if (edef != null && edef.expr != null)
 f(edef);
+ case EFormat(parts):
+ for (part in parts) switch part.kind {
+ case FRaw(_) | FIdent(_):


This was my concern about having separate FIdent variant instead of having it all in FExpr: when itering/mapping over an expression, special handling is needed for format strings, if one wants to e.g. change all identifiers. OTOH this of course preserves more info about the source, so it may be useful in some cases. I don't have a strong opinion here.

It's usually a good idea to design your data in a way that it doesn't lose information. Without FIdent, the cases $ident and ${ident} would be indistinguishable and could not be restored from the data. This would certainly be a concern for hxparser. Maybe not so much for the compiler parser because it mostly "doesn't matter", but I still think this is the better approach.

Why not create an adhoc expression and deal with it accordingly?

iter:

case FIdent(name): f(macro @:pos(part.pos) $i{name})

map:

case FIdent(name): { pos: part.pos, kind: switch f(macro @:pos(part.pos) $i{name}) { case macro $i{ident}: FIdent(name); case v: FExpr(v); } }

That's an option, nice idea!

…se letter

nadako · 2017-07-30T15:48:31Z

std/haxe/macro/ExprTools.hx

@@ -130,7 130,8 @@ class ExprTools {
 f(edef);
 case EFormat(parts):
 for (part in parts) switch part.kind {
- case FRaw(_) | FIdent(_):
+ case FRaw(_):
+ case FIdent(i): f({pos: part.pos, expr: EConst(CIdent(i))});


One thing that's bugging me a bit is that the fake Ident position here include the dollar sign, which is inconsistent with the expansion where we do the 1, but we can't really do the same in ExprTools, because we'd have to convert position with getPosInfos/makePosition, which seems too much. Or not?

I think I'll just keep the original position FIdent position as back2dos originally proposed.

…isplay.ExprPreprocessing.process_expr doesn't seem to be required anymore when map_expr supports FIdents

…${}`, not inner expression

back2dos · 2017-07-30T19:28:28Z

Ok, I'm all for the part that fixes things. But I would request at that it be EFormat(original:String, parts:Ref<Array<FormatFragment>>);.

From my experience, when you're parsing JSX-like syntax, the node syntax comes first and then the interpolated expressions. Just consider what you get from <foo bar=${123}>${321}</foo>. Even the opening tag is split apart. I'm sure it's well-intended, but in reality it's not that useful.

Take a look at the haxe-react parser, you will see that it first uses Xml.parse to get the raw structure, and then picks up interpolation within that. Similarly, tink_hxx parses the tag structure primarily and identifies code blocks within it. In both cases the raw string is needed. Also, for compatibility with JSX, both libraries support <tag>{expr}</tag>, so having some parts preparsed and others not only adds to the complexity.

With these changes merged, we'd have to reconstitute the string. I assume that the parser for the "mini-AST" is necessary for string interpolation to work properly. But pushing down the information onto macros is pretty wasteful, if it's not needed. First the AST has to be allocated as interpreter value. And if the original string is not available, you will need to pass it to haxe.macro.Printer to get back the string which then still requires unescaping, for which I guess one would use haxe.Json.parse to not have to do it by hand. Only then do you have the String you wanted, actually start parsing it, and call back to Context.parseInlineString to parse again the expressions that you've just converted to a String \o/

nadako · 2017-07-30T21:56:49Z

I would request at that it be EFormat(original:String, parts:Ref<Array>);.

That sounds fair. I'm not sure about Ref because IIRC we don't really use it for untyped AST structures, but I see no particular reason for that. And yeah having original strings might be useful (also it'll help avoiding the c metadata issue - we can just use the original string).

That said, I don't see much complexity in parsing <foo bar=${123}>${321}</foo>, if anything one can do something like:

var xmlParts = [];
var interpolatedExprs = new Map();
var nextId = 0;
for (part in parts) {
    switch part.kind {
        case FRaw(s):
            xmlParts.push(s);
        case FIdent(i):
            var id = nextId  ;
            xmlParts.push('{$id}');
            interpolatedExprs[id] = macro $i{i};
        case FExpr(e):
            var id = nextId  ;
            xmlParts.push('{$id}');
            interpolatedExprs[id] = e;
    }
}
var xml = xmlParts.join("");

// ...parse and process xml, taking exprs from interpolatedExprs map

I get that this might not be enough for haxe-react because it wants to support that <tag>{expr}</tag> syntax which Haxe has no idea about. But I can easily imagine valid JSX-like (and not only) DSLs based on actual haxe String interpolation syntax, if we expose it in macros.

nadako · 2017-07-30T22:12:02Z

Meh, I don't know, having both original string and format segments makes it easy to construct invalid ast nodes (that is with original string and segments that don't correspond to each other), and having segments as a Ref prevents modifying it though macros. Gotta think about it some more.

back2dos · 2017-07-31T07:50:54Z

The thing is I want to have accurate position tracking down to the character and tink_hxx accomplishes that by parsing the XML itself.

So then I could try to pad the placeholder with whitespace to the length of the expression the place is held for (which I assume I can get from the fragment's position). All this works only if the original expression was not shorter than the placeholder, e.g. can't replace $a by {$42} and somehow get the offsets to remain fine.

That leaves you with some pretty pesky gymnastics to get the offsets right. Yes, it's doable, but I would very much prefer not to have to do it ;)

As for your concerns: I think the typer should just ignore the raw string.

A different approach would be to have CString(s:String, doubleQuote:Bool) and MacroStringTools.getStringFragments. Both map and iter could do:

case CString(s, false): f(MacroStringTools.formatString(s, e.pos));

If you can somehow avoid to decompose the string twice into fragments (or if parsing cost is negligible compared to sending the result down to the interpreter), this would probably be the simplest and fastest approach.

nadako · 2017-07-31T14:59:05Z

Alright, I'm closing this for now so it's not accidentally merged, will reopen when ready :)

Simn · 2017-09-30T07:40:38Z

What's the agenda here? I would really like to resolve the string quoting mess...

nadako · 2017-09-30T07:58:38Z

We gotta figure out what API we want for this, so I know what exactly to implement :) It seems that we want to keep strings at the untyped AST level and parse it later when typing, right? But I think it could be useful to provide access to the mini-ast in macro somehow, so people can easily use interpolated strings for their DSLs.

Simn · 2017-09-30T07:59:37Z

Could we do this in two steps?

Clean up our internal mess.
Figure out how to expose it to macros.

nadako · 2017-09-30T08:03:45Z

I have to look into this again...

nadako requested review from Simn and ncannasse July 29, 2017 23:39

nadako added this to the 4.0 milestone Jul 29, 2017

nadako commented Jul 30, 2017

View reviewed changes

nadako added 2 commits July 30, 2017 18:18

rework format strings

c00bbd7

fix identifier interpolation for identifiers that start with upper-ca…

ade38a3

…se letter

nadako force-pushed the fmt_string branch from 7dc15f3 to ade38a3 Compare July 30, 2017 15:18

nadako added 2 commits July 30, 2017 18:38

fix printing FIdent parts ($$$WAG)

cc9955a

create fake ident nodes when iterating/mapping FIdent segments

d1a38b3

nadako commented Jul 30, 2017

View reviewed changes

nadako added 5 commits July 30, 2017 20:37

minor optimization

6ba328f

omit dollar sign position for fake ident expr for FIdent case, also D…

6915830

…isplay.ExprPreprocessing.process_expr doesn't seem to be required anymore when map_expr supports FIdents

also omit FIdent dollar sign in iter_expr

70ad7da

preserve segment position for FExpr case because that points to the `…

a887486

…${}`, not inner expression

screw this - never change position of a format segment itself

ef95236

fix js tests

0610880

get a handle on FIdent positions

47797d4

back2dos mentioned this pull request Jul 31, 2017

[feature request] Context.parseSingleExpr #6479

Open

nadako closed this Jul 31, 2017

skial mentioned this pull request Aug 2, 2017

Haxe Roundup 395 skial/haxe.io#418

Closed

nadako mentioned this pull request Apr 30, 2020

Comments in string interpolation expressions break nested interpolation in subsequent expressions. #9370

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rework format strings (closes #2244 #5803) #6475

rework format strings (closes #2244 #5803) #6475

nadako commented Jul 29, 2017 •

edited

Loading

nadako commented Jul 29, 2017

nadako Jul 30, 2017

Simn Jul 30, 2017

back2dos Jul 30, 2017

nadako Jul 30, 2017

nadako Jul 30, 2017

nadako Jul 30, 2017

back2dos commented Jul 30, 2017

nadako commented Jul 30, 2017

nadako commented Jul 30, 2017

back2dos commented Jul 31, 2017 •

edited

Loading

nadako commented Jul 31, 2017

Simn commented Sep 30, 2017

nadako commented Sep 30, 2017

Simn commented Sep 30, 2017

nadako commented Sep 30, 2017

rework format strings (closes #2244 #5803) #6475

rework format strings (closes #2244 #5803) #6475

Conversation

nadako commented Jul 29, 2017 • edited Loading

nadako commented Jul 29, 2017

nadako Jul 30, 2017

Choose a reason for hiding this comment

Simn Jul 30, 2017

Choose a reason for hiding this comment

back2dos Jul 30, 2017

Choose a reason for hiding this comment

nadako Jul 30, 2017

Choose a reason for hiding this comment

nadako Jul 30, 2017

Choose a reason for hiding this comment

nadako Jul 30, 2017

Choose a reason for hiding this comment

back2dos commented Jul 30, 2017

nadako commented Jul 30, 2017

nadako commented Jul 30, 2017

back2dos commented Jul 31, 2017 • edited Loading

nadako commented Jul 31, 2017

Simn commented Sep 30, 2017

nadako commented Sep 30, 2017

Simn commented Sep 30, 2017

nadako commented Sep 30, 2017

nadako commented Jul 29, 2017 •

edited

Loading

back2dos commented Jul 31, 2017 •

edited

Loading