-
-
Notifications
You must be signed in to change notification settings - Fork 650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rework format strings (closes #2244 #5803) #6475
Conversation
PS The cpp changes were needed because hxcpp uses the same template format as haxe interpolation for its xml directives, but Haxe is not supposed to process those, so I made the double-quoted strings required for |
std/haxe/macro/ExprTools.hx
Outdated
@@ -128,6 128,11 @@ class ExprTools { | |||
} | |||
if (edef != null && edef.expr != null) | |||
f(edef); | |||
case EFormat(parts): | |||
for (part in parts) switch part.kind { | |||
case FRaw(_) | FIdent(_): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was my concern about having separate FIdent
variant instead of having it all in FExpr
: when itering/mapping over an expression, special handling is needed for format strings, if one wants to e.g. change all identifiers. OTOH this of course preserves more info about the source, so it may be useful in some cases. I don't have a strong opinion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's usually a good idea to design your data in a way that it doesn't lose information. Without FIdent, the cases $ident
and ${ident}
would be indistinguishable and could not be restored from the data. This would certainly be a concern for hxparser. Maybe not so much for the compiler parser because it mostly "doesn't matter", but I still think this is the better approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not create an adhoc expression and deal with it accordingly?
iter:
case FIdent(name): f(macro @:pos(part.pos) $i{name})
map:
case FIdent(name): {
pos: part.pos,
kind: switch f(macro @:pos(part.pos) $i{name}) {
case macro $i{ident}: FIdent(name);
case v: FExpr(v);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an option, nice idea!
std/haxe/macro/ExprTools.hx
Outdated
@@ -130,7 130,8 @@ class ExprTools { | |||
f(edef); | |||
case EFormat(parts): | |||
for (part in parts) switch part.kind { | |||
case FRaw(_) | FIdent(_): | |||
case FRaw(_): | |||
case FIdent(i): f({pos: part.pos, expr: EConst(CIdent(i))}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that's bugging me a bit is that the fake Ident
position here include the dollar sign, which is inconsistent with the expansion where we do the 1, but we can't really do the same in ExprTools
, because we'd have to convert position with getPosInfos
/makePosition
, which seems too much. Or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll just keep the original position FIdent position as back2dos originally proposed.
…isplay.ExprPreprocessing.process_expr doesn't seem to be required anymore when map_expr supports FIdents
…${}`, not inner expression
Ok, I'm all for the part that fixes things. But I would request at that it be From my experience, when you're parsing JSX-like syntax, the node syntax comes first and then the interpolated expressions. Just consider what you get from Take a look at the haxe-react parser, you will see that it first uses With these changes merged, we'd have to reconstitute the string. I assume that the parser for the "mini-AST" is necessary for string interpolation to work properly. But pushing down the information onto macros is pretty wasteful, if it's not needed. First the AST has to be allocated as interpreter value. And if the original string is not available, you will need to pass it to |
That sounds fair. I'm not sure about That said, I don't see much complexity in parsing var xmlParts = [];
var interpolatedExprs = new Map();
var nextId = 0;
for (part in parts) {
switch part.kind {
case FRaw(s):
xmlParts.push(s);
case FIdent(i):
var id = nextId ;
xmlParts.push('{$id}');
interpolatedExprs[id] = macro $i{i};
case FExpr(e):
var id = nextId ;
xmlParts.push('{$id}');
interpolatedExprs[id] = e;
}
}
var xml = xmlParts.join("");
// ...parse and process xml, taking exprs from interpolatedExprs map I get that this might not be enough for haxe-react because it wants to support that |
Meh, I don't know, having both original string and format segments makes it easy to construct invalid ast nodes (that is with original string and segments that don't correspond to each other), and having segments as a Ref prevents modifying it though macros. Gotta think about it some more. |
The thing is I want to have accurate position tracking down to the character and tink_hxx accomplishes that by parsing the XML itself. So then I could try to pad the placeholder with whitespace to the length of the expression the place is held for (which I assume I can get from the fragment's position). All this works only if the original expression was not shorter than the placeholder, e.g. can't replace That leaves you with some pretty pesky gymnastics to get the offsets right. Yes, it's doable, but I would very much prefer not to have to do it ;) As for your concerns: I think the typer should just ignore the raw string. A different approach would be to have case CString(s, false): f(MacroStringTools.formatString(s, e.pos)); If you can somehow avoid to decompose the string twice into fragments (or if parsing cost is negligible compared to sending the result down to the interpreter), this would probably be the simplest and fastest approach. |
Alright, I'm closing this for now so it's not accidentally merged, will reopen when ready :) |
What's the agenda here? I would really like to resolve the string quoting mess... |
We gotta figure out what API we want for this, so I know what exactly to implement :) It seems that we want to keep strings at the untyped AST level and parse it later when typing, right? But I think it could be useful to provide access to the mini-ast in macro somehow, so people can easily use interpolated strings for their DSLs. |
Could we do this in two steps?
|
I have to look into this again... |
This is a complete rework of interpolated strings handling, fixing issues like #2244 #5803 and providing a macro API to work with interpolated strings (through the new
EFormat
AST node kind).It works by creating a mini-AST for an interpolated string on the lexer level which is then easily parsed to a
EFormat(parts:Array<FormatSegment>)
node.For example a
'a$b${c 2}'
string is emitted as aFmt [ raw(a), name(b), code(c, , 2) ]
token which is parsed intoEFormat [ FRaw(a), FIdent(b), FExpr(EBinop(Add, Const(c), Const(2)) ]
which is then expanded into"a" b (c 2)
expression on typing.Note that single-quoted strings without interpolations (so no
$a
or${a}
) are emitted as normal string tokens (and thusEConst(CString)
AST nodes).Other than fixing issues with reentrancy and macro reification (see tests included), I believe this opens new possibilities for macro-processing strings, like the infamous JSX. I think it's now much easier to parse things like
jsx('<div>${names.map(name -> jsx('<p>$name</p>'))}</div>')
This is obviously a (somewhat) breaking change for macros that process strings, but it's Haxe 4 and it's for the greater good.