Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rework format strings (closes #2244 #5803) #6475

Closed
wants to merge 11 commits into from
Closed

Conversation

nadako
Copy link
Member

@nadako nadako commented Jul 29, 2017

This is a complete rework of interpolated strings handling, fixing issues like #2244 #5803 and providing a macro API to work with interpolated strings (through the new EFormat AST node kind).

It works by creating a mini-AST for an interpolated string on the lexer level which is then easily parsed to a EFormat(parts:Array<FormatSegment>) node.

For example a 'a$b${c 2}' string is emitted as a Fmt [ raw(a), name(b), code(c, , 2) ] token which is parsed into EFormat [ FRaw(a), FIdent(b), FExpr(EBinop(Add, Const(c), Const(2)) ] which is then expanded into "a" b (c 2) expression on typing.

Note that single-quoted strings without interpolations (so no $a or ${a}) are emitted as normal string tokens (and thus EConst(CString) AST nodes).

Other than fixing issues with reentrancy and macro reification (see tests included), I believe this opens new possibilities for macro-processing strings, like the infamous JSX. I think it's now much easier to parse things like jsx('<div>${names.map(name -> jsx('<p>$name</p>'))}</div>')

This is obviously a (somewhat) breaking change for macros that process strings, but it's Haxe 4 and it's for the greater good.

@nadako nadako requested review from Simn and ncannasse July 29, 2017 23:39
@nadako nadako added this to the 4.0 milestone Jul 29, 2017
@nadako
Copy link
Member Author

nadako commented Jul 29, 2017

PS The cpp changes were needed because hxcpp uses the same template format as haxe interpolation for its xml directives, but Haxe is not supposed to process those, so I made the double-quoted strings required for @:buildXml and other c meta.

@@ -128,6 128,11 @@ class ExprTools {
}
if (edef != null && edef.expr != null)
f(edef);
case EFormat(parts):
for (part in parts) switch part.kind {
case FRaw(_) | FIdent(_):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my concern about having separate FIdent variant instead of having it all in FExpr: when itering/mapping over an expression, special handling is needed for format strings, if one wants to e.g. change all identifiers. OTOH this of course preserves more info about the source, so it may be useful in some cases. I don't have a strong opinion here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's usually a good idea to design your data in a way that it doesn't lose information. Without FIdent, the cases $ident and ${ident} would be indistinguishable and could not be restored from the data. This would certainly be a concern for hxparser. Maybe not so much for the compiler parser because it mostly "doesn't matter", but I still think this is the better approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not create an adhoc expression and deal with it accordingly?

iter:

case FIdent(name): f(macro @:pos(part.pos) $i{name})

map:

case FIdent(name): {
  pos: part.pos,
  kind: switch f(macro @:pos(part.pos) $i{name}) {
     case macro $i{ident}: FIdent(name);
     case v: FExpr(v);
  }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an option, nice idea!

@@ -130,7 130,8 @@ class ExprTools {
f(edef);
case EFormat(parts):
for (part in parts) switch part.kind {
case FRaw(_) | FIdent(_):
case FRaw(_):
case FIdent(i): f({pos: part.pos, expr: EConst(CIdent(i))});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that's bugging me a bit is that the fake Ident position here include the dollar sign, which is inconsistent with the expansion where we do the 1, but we can't really do the same in ExprTools, because we'd have to convert position with getPosInfos/makePosition, which seems too much. Or not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll just keep the original position FIdent position as back2dos originally proposed.

@back2dos
Copy link
Member

Ok, I'm all for the part that fixes things. But I would request at that it be EFormat(original:String, parts:Ref<Array<FormatFragment>>);.

From my experience, when you're parsing JSX-like syntax, the node syntax comes first and then the interpolated expressions. Just consider what you get from <foo bar=${123}>${321}</foo>. Even the opening tag is split apart. I'm sure it's well-intended, but in reality it's not that useful.

Take a look at the haxe-react parser, you will see that it first uses Xml.parse to get the raw structure, and then picks up interpolation within that. Similarly, tink_hxx parses the tag structure primarily and identifies code blocks within it. In both cases the raw string is needed. Also, for compatibility with JSX, both libraries support <tag>{expr}</tag>, so having some parts preparsed and others not only adds to the complexity.

With these changes merged, we'd have to reconstitute the string. I assume that the parser for the "mini-AST" is necessary for string interpolation to work properly. But pushing down the information onto macros is pretty wasteful, if it's not needed. First the AST has to be allocated as interpreter value. And if the original string is not available, you will need to pass it to haxe.macro.Printer to get back the string which then still requires unescaping, for which I guess one would use haxe.Json.parse to not have to do it by hand. Only then do you have the String you wanted, actually start parsing it, and call back to Context.parseInlineString to parse again the expressions that you've just converted to a String \o/

@nadako
Copy link
Member Author

nadako commented Jul 30, 2017

I would request at that it be EFormat(original:String, parts:Ref<Array>);.

That sounds fair. I'm not sure about Ref because IIRC we don't really use it for untyped AST structures, but I see no particular reason for that. And yeah having original strings might be useful (also it'll help avoiding the c metadata issue - we can just use the original string).


That said, I don't see much complexity in parsing <foo bar=${123}>${321}</foo>, if anything one can do something like:

var xmlParts = [];
var interpolatedExprs = new Map();
var nextId = 0;
for (part in parts) {
    switch part.kind {
        case FRaw(s):
            xmlParts.push(s);
        case FIdent(i):
            var id = nextId  ;
            xmlParts.push('{$id}');
            interpolatedExprs[id] = macro $i{i};
        case FExpr(e):
            var id = nextId  ;
            xmlParts.push('{$id}');
            interpolatedExprs[id] = e;
    }
}
var xml = xmlParts.join("");

// ...parse and process xml, taking exprs from interpolatedExprs map

I get that this might not be enough for haxe-react because it wants to support that <tag>{expr}</tag> syntax which Haxe has no idea about. But I can easily imagine valid JSX-like (and not only) DSLs based on actual haxe String interpolation syntax, if we expose it in macros.

@nadako
Copy link
Member Author

nadako commented Jul 30, 2017

Meh, I don't know, having both original string and format segments makes it easy to construct invalid ast nodes (that is with original string and segments that don't correspond to each other), and having segments as a Ref prevents modifying it though macros. Gotta think about it some more.

@back2dos
Copy link
Member

back2dos commented Jul 31, 2017

The thing is I want to have accurate position tracking down to the character and tink_hxx accomplishes that by parsing the XML itself.

So then I could try to pad the placeholder with whitespace to the length of the expression the place is held for (which I assume I can get from the fragment's position). All this works only if the original expression was not shorter than the placeholder, e.g. can't replace $a by {$42} and somehow get the offsets to remain fine.

That leaves you with some pretty pesky gymnastics to get the offsets right. Yes, it's doable, but I would very much prefer not to have to do it ;)

As for your concerns: I think the typer should just ignore the raw string.

A different approach would be to have CString(s:String, doubleQuote:Bool) and MacroStringTools.getStringFragments. Both map and iter could do:

case CString(s, false): f(MacroStringTools.formatString(s, e.pos));

If you can somehow avoid to decompose the string twice into fragments (or if parsing cost is negligible compared to sending the result down to the interpreter), this would probably be the simplest and fastest approach.

@nadako
Copy link
Member Author

nadako commented Jul 31, 2017

Alright, I'm closing this for now so it's not accidentally merged, will reopen when ready :)

@nadako nadako closed this Jul 31, 2017
@Simn
Copy link
Member

Simn commented Sep 30, 2017

What's the agenda here? I would really like to resolve the string quoting mess...

@nadako
Copy link
Member Author

nadako commented Sep 30, 2017

We gotta figure out what API we want for this, so I know what exactly to implement :) It seems that we want to keep strings at the untyped AST level and parse it later when typing, right? But I think it could be useful to provide access to the mini-ast in macro somehow, so people can easily use interpolated strings for their DSLs.

@Simn
Copy link
Member

Simn commented Sep 30, 2017

Could we do this in two steps?

  1. Clean up our internal mess.
  2. Figure out how to expose it to macros.

@nadako
Copy link
Member Author

nadako commented Sep 30, 2017

I have to look into this again...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants