-
Notifications
You must be signed in to change notification settings - Fork 29.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Set --experimental-default-type
mode by detecting ESM syntax in entry point
#50043
Comments
In the TSC meeting today we agreed that at some point, Node needs to run ESM code by default. So currently the options are:
I’m fine with either. In nodejs/TSC#1445 there were people hesitant about the “require some people to make a one-minute change” first option, so I’m curious if they prefer this instead. I’m unaware of other solutions yet. |
This sounds like a great feature to me. And one that would have a lot of community support. As a user, it is very frustrating for node to say "I know exactly what is wrong, but too bad". There may be some technical challenges to facilitate, but I think we can accomplish it. The only downside I can think of is a performance cost paid only by those who would otherwise hit a brick wall (those who don't need this feature pay no cost); and that seems a better alternative. |
I agree it could well be time to explore this direction, so far as it can fit within the narrow space we've created for it. When previously discussed this was always in the context of being the primary determiner, but having it restricted to the top-level and as an option sounds very interesting to explore further experimentally. I would be weary unflagging something though until all the edge cases have been worked through including things like workspace symlink consistency and the potential publishing footgun. |
Detection isn't reliable, though, because the two parse goals are ambiguous. |
I think this is a good idea. It's kind of rage inducing that node doesn't do this already, tbh. However, the issue of ambiguous commonjs deps I think needs a subtle change (unless I'm misreading the proposal, and this is already the intent, in which case, just a clarification). If I set
Ie, it should act as if a file at With this, the ways to trigger ESM are:
The issue of parsing being fundamentally ambiguous is not really a big issue, imo, because it doesn't have to be 100% accurate in all cases; all it does is add another way to trigger ESM mode, by just using ESM, which is as it should be. |
I am fairly certain that it is possible to let V8 return this information after parsing & hit the compilation cache when we parse it for the second time (which is essentially no-op and we pay no extra cost) |
Yes like the other |
Actually for the use case alone I don't think even Acorn is needed, we can just compile the script via |
That’s clever, and it eliminates the “future syntax” problem. V8 would be able to parse any syntax that V8 can evaluate 😄 |
I really like this idea. Are you already working on an implementation? |
1 for this and 1 for letting V8 do the heavy lifting I think that was the reservation last time. |
To be clear - there are some caveats:
This means that if someone has code that works in loose mode but not strict mode like: with(someObject) {
// do stuff
} and they rely on it running in loose mode - running it as a module (because e.g no require) or doing the opposite (defaulting to CJS again, and someone writes code that only works in strict mode) is going to break their code. I went over the 20 first Google results for Node.js tutorial right now and none of them are affected. Personally I'm not concerned with this issue and think this is a great solution overall to 'how can we migrate to esm without breaking anything' so I'm still heavily 1 on the issue but I just wanted to raise this point. |
With this proposal, it would only run unambiguous module code as ESM, so |
1 on the approach. |
To be clear, there's plenty of sloppy mode code on npm, that doesn't use The only code that's safe to "detect" is code that uses with or top-level |
I have a loader that does this btw, so I know it's possible 🙂 |
The current plan is if the entry point is a |
This proposal is exactly about "detecting" uses of import/export statements. In other words, code that would throw a syntax error (because of import/export, not any syntax error) if we tried to run it as CommonJS (sloppy or strict). |
@targos awesome, that sounds great then! as long as the only code it makes a "guess" on is unambigously one or the other, there's zero concern imo :-) |
Actually maybe this can be as simple as just little tweaks to the existing entry point execution node/lib/internal/modules/run_main.js Lines 99 to 105 in 1d220b5
We can just inverse the two branches and go to the non-ESM branch first, if detection mode is used, catch any error from |
If this is done it likely should apply to everything not just the endpoint.
Wrapper scripts, script runners, etc. Won't be able to use these files
otherwise since they won't have access to this mode of operation.
Additionally this makes the interpretation of files argv dependent so that
needs to be easy to pick up on.
…On Thu, Oct 5, 2023, 5:11 AM Joyee Cheung ***@***.***> wrote:
Actually maybe this can be as simple as just little tweaks to the existing
entry point execution
https://github.com/nodejs/node/blob/1d220b55ac397f15758e83d1143789608e3fca4a/lib/internal/modules/run_main.js#L100-L105
If detection mode is used, go to the non-ESM branch first, catch any error
from Module._load, check if it's a SyntaxError caused by import/loose
statements and if it is, go to the SM branch, otherwise rethrow.
—
Reply to this email directly, view it on GitHub
<#50043 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABZJI4AC7XWEELMKPSOIH3X52BT7AVCNFSM6AAAAAA5TEF33GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBYGU3DAOJYGI>
.
You are receiving this because you are on a team that was mentioned.Message
ID: ***@***.***>
|
Can you elaborate on what that "wrapper scripts, script runners" means? Do you mean workers or vm APIs or embedder APIs? If we just tweak node/lib/internal/modules/cjs/loader.js Lines 388 to 390 in 51f4ff2
Note that workers also use For VM APIs, there is already distinction between The embedder API would need an extension, likely for
The interpretation of the entry point has already been dependent on filenames ( |
I’ve started a branch at https://github.com/nodejs/node/compare/main…GeoffreyBooth:node:detect-module, but all it has is a docs update so far. I also have some old code from nodejs/ecmascript-modules#55. At least all the tests from that old branch should be reusable, though we should rewrite them into the modern style with the test runner and |
I mean code that refer to bin files via #!/usr/bin/env node
// bin/foo-debug
addDebugLogging()
// go into normal bin workflow
// just invoke the file normally, this note this file is both valid CJS & ESM
import(which('foo') || './foo') // this will load the bin not as an entrypoint This also can occur when you are stuck for some reason using an API from a bin (often coming from things using .main style invocation) import('bin-or-lib').diagnostics.push(recoverableException) // also not an entrypoint Setting "bin" in the package.json field will result potentially in bin-or-lib being loaded in different modes since invoking via npm wrapper will see entrypoint behavior but loading via import() will load it and potentially get a different result. Note: I'm not opposing the removal of ability to statically analyze what mode a file is. I think having the files act the same is important to avoid the WTFs form the mismatch and I doubt many people will be upset if non-entrypoints act the same as entrypoints. Having consistent behavior is in general a boon to avoid questions like:
and avoid necessary documentation:
which kind of goes against some of the point of having a |
I think this proposal would only affect entry points that error in current Node; by definition, this only affects entries with I guess there’s a hazard that an ESM entry point might work when run directly in this new mode, but then when you run it via a CommonJS or ambiguous wrapper it stops working; but at least that’s not something that works in current Node and then stops working if this new mode becomes the default. It might be annoying to debug, and we should try to guide people to a fix via an error message, but I don’t think this potential scenario should be disqualifying. |
Correct, this is the breakage that would not be able to be worked around. Files would change behavior depending on the location doing the loading and be stuck in a state unable to be altered.
It isn't that it would stop working in the new mode, it would potentially execute in CJS in one form and ESM in the other. This is doubly true if you swap the default parse to go for ESM then failing back to CJS. If under this mode we move from an Error to a valid execution of CJS we shouldn't for example move to swap it to ESM later. I think in the short term it could catch errors and prevent alteration of code which would be good but this wouldn't work for cases where code is wrapped. They could not catch the errors and alter the mode of execution.
I think having fewer errors and applying this on cases that aren't the entry point makes more sense than leading users to errors and having them fix. I'm unclear on why we would only want this to apply to the entry point when it causes a new form of breakage to be possible. We could simply not error and use the same mode detection for all files in these ambiguous situations. I'm fine to a -0 due to amount of complexity of out of band non-static analysis friendly operation here but think there are only negatives by only fixing the entrypoint for this. |
To be clear, are there direct objections to allowing this detection on ALL ambiguous cases but a desire to allow it in the entrypoint? If so, is there a clear reason for this if we can share the expensive code cache like above to avoid major per issues? |
The inverse is also true though. It could be expected to be CJS because it lacks Theorizing what someone may expect, intend, etc. is not a route of discussion I'd find comforting and would instead find a good reason to not ship a feature. I think trying to frame this as some big implicit configuration feature is missing the real gains to be had. Users want messages of the form ~
Occupying spaces that would not run and solving user errors seems a general good. It is exceedingly rare that an ESM module lacks export statements if it isn't a dependency and also exceedingly rare that an ESM entrypoint lacks an import statement. I think we could solve real problems without any potential breakage by only changing what throws right now by just constraining this to a double parse. If this suddenly changed other file interpretations as well it would be an out of band / implicit configuration mechanism which I'd be very resistant to. As it was framed originally, I thought this was intended to solve user errors and not intended provide new means of process configuration to take into account while writing code. |
I think there's a lot of value in simplicity/predictability. The case of
is, I would think, likely to be pretty rare. As such, even aside from @bmeck's concerns, I don't think it would be worth adding complexity just for that case - better to stick to the more easily explained "this only changes the behavior of files with an |
Should we add TLA to the list of unambiguous syntax? so console.log(this===undefined? 'ESM' : 'CJS'); // prints CJS and await console.log(this===undefined? 'ESM' : 'CJS'); // prints ESM |
In response to #50043 (comment): on the scale of node and the npm registry, rare is common. Edge cases matter. |
I still think that doing this only for the entry point is fine and the easier fix since Node is put into "esm" or "cjs" mode once rather than having to detect for every file. This enforcing |
--experimental-default-type=detect-module
--experimental-default-type
mode by detecting ESM syntax in entry point
@benjamingr dual parse is well understood, I don't know of any tool that does parse entrypoint and swap detection modes based upon that. I wouldn't find comfort or assume ease in inventing a new undiscovered set of edge cases. If such a tool exists we should likely figure out what it is like. Tools like TypeScript don't operate in this way or even have configuration to allow for it, but those tools generally do have a configuration for double parsing. |
I opened #50064 to separate the discussion of the “detect the entry point” approach (this proposal) and the “detect every file” approach (that one). |
I support this proposal, the cost of double parsing one file is minimal vs the benefit for the end user. |
Still gathering thoughts on how this would affect TypeScript. If the entrypoint to // @filename: a.js
import "./b.js" // @filename: b.js
export {}; $ node a.js # ✅
$ mv a.js a.mjs
$ node a.mjs # 💥
b.js:1
export {};
^^^^^^
SyntaxError: Unexpected token 'export' I started out with modules, I made one more explicitly a module, and now I don’t have modules anymore? |
No, per the proposal, detection only happens for ambiguous files:
Yes, confusion can occur if you change your entry point separately from the rest of your app. That’s one of the weaknesses of this approach. Fortunately I don’t think that doing so is all that common; someone using the |
I somewhat prefer the approach of the other proposal to always be trying both for this exact reason. We would always do the fallback so we don't break users in these edge cases. If we only detect at the entrypoint file then the behaviour could be unexpected. The inverse of that scenario is that it's also relatively common to vendor parts of modules by copy/pasting files directly into your app and I can imagine we would encounter users building ESM entrypoints and then expecting loose CJS files copy/pasted from elsewhere into that project to just work. |
It should auto detect for all files whose package.json resolves to the same package.json of the entry point module. Not for anything in other packages or deps, but for everything directly in the entry point's local package. |
I think any discussion of detection for all files should happen in #50064. This proposal is focused on using the entry point as a signal to set the default module system. |
I think what @isaacs is getting at is a hybrid of the two proposals where it only does the check once, but it's module-scoped so it doesn't potentially interfere with expectations of dependency modules. I think it makes sense to bring up here given that it's not fully checking all files. |
From an implementation point of view, detecting all files other than dependencies is much closer to #50064 than it is to this. It would mean first implementing detection for potentially any file, not just an entry point, and then adding “isn’t under |
I haven't seen this discussed yet, but one of the major gotchas I see with this proposal so far is the possibility for users to accidentally publish packages that are broken without them realizing. Take an example where I have: // package.json
{
"name": "my-cli",
"main": "main.js"
} // main.js
import { parseArgs } from "util:path";
console.log(parseArgs({ args: process.argv.slice(2), strict: false })); I can test this script locally and it'll work; it's not in But, if I publish this package, suddenly this code is appearing in So, I'm not personally convinced that a missing (EDIT: npm/cli#6855 (comment) maybe implies that this had been discussed somewhere? I have not found it.) In a similar vein, @weswigham also noted that if you ever "vendor" a package without putting it in So given those two ideas, I'm much more a fan of getting all of the package managers to generate different |
Indeed, format being entry point dependent is also probably incredibly bad for tooling, which thus far may either not need an entry point, or needs to acknowledge multiple entry points. In either case, a detection flag like this can still leave intended module formats completely ambiguous. Detection for every file might be OK, tools like TS and webpack have had modes for that for forever with known caveats, but only using the entry point sounds like it could lead to significant pain. |
Yep, which is why I think I agree with @isaacs about entrypoint plus a separate check for the first file seen from each module. |
FWIW I think that the potential cost of running detection on every file wouldn't be that bad; in addition to previous comments about V8 internals (no clue about that, I have no experience), it's pretty easy to tell textually whether or not a file may contain imports; e.g. if you don't match This is a common tactic in editors, e.g. "I want to run find-all-definition for the symbol (Obviously this idea can be refined; I'm just throwing an idea out for that if perf actually turns out to be a problem.)
I'm a little confused as to this idea, but maybe I'm missing it in the thread above; my reading of that statement is that the heurstic is somehow just one level deep, which I think is more confusing than "just the entrypoint" or "every ambiguous file gets detection". I much prefer the latter for reasons stated above. Every file should individually get to choose whether or not they're CJS or ESM; mixing them is fine. Entrypoint-dependent behavior gives me horrible flashbacks to Python, where import resolution depends on the directory of the entrypoint (metaphorically in the JS world, it'd be like if running a script in your package meant that some random other import could find files in your |
Sure. I think for now I’m treating this one as more or less on hold until we know whether #50064 is viable. I had assumed that running detection on every file would be prohibitively performance impactful, but it seems like my assumption might have been wrong on that, in which case #50064 offers many advantages. So I’m looking into trying to get an implementation of #50064 first before considering alternatives. |
@jakebailey and @weswigham have mentioned it above, but just for the sake of completeness, our recent design meeting notes point out some of the issues TypeScript would have with this proposal: microsoft/TypeScript#56103. |
Closing as alternative #50096 has landed. |
After landing #49869, I opened nodejs/TSC#1445 to discuss when to flip the default value of
--experimental-default-type
fromcommonjs
tomodule
, which would make Node itself default to ESM whenever the user didn’t make it explicit that they wanted to work in CommonJS by using"type": "commonjs"
inpackage.json
, or a.cjs
extension, etc. There were concerns raised on that thread around breaking the “loose” files case, where there is nopackage.json
present, and breaking tutorials such as https://nodejs.org/en/docs/guides/getting-started-guide that assume a CommonJS default.I propose we create
--experimental-default-type=detect-module
that would function as follows:--experimental-default-type=commonjs
and--experimental-default-type=module
, it would only apply to ambiguous cases: a file with no explicit.mjs
or.cjs
extension, either nopackage.json
or one that lacks atype
field, no--input-type
or--experimental-default-type=module
flags passed, not undernode_modules
.import
orexport
statements. (Notimport()
expressions that are allowed in CommonJS, not the CommonJS wrapper variables which could be set as globals by user code.) If the file cannot be parsed, error.import
orexport
statement is found, run the entry point (and the rest of the app) as if--experimental-default-type=module
had been passed. Else run as if--experimental-default-type=commonjs
had been passed.This would solve the “loose files” and tutorials cases: ambiguous files would continue to be run as CommonJS. Only files with
import
orexport
statements, which currently error if you try to run them, would start to run as ESM. This would solve the goal of flipping, which is to let people start using ESM syntax by default without first opting in somehow.I implemented big parts of this in 2019 in nodejs/ecmascript-modules#55. It’s very similar to a feature request from 2021, #39353. The main difference between then and now is the existence of
--experimental-default-type
. I can respond to some of the concerns raised on those earlier issues:detect-module
, notauto
, and I think “it runs as ESM ifimport
orexport
statements are found, or as CommonJS otherwise” is simple enough that users will get it.console.log(3)
). I can’t tell apart the “ambiguous syntax” files from either of the other two, but I don’t have to: those run as CommonJS today, and they can continue to run as CommonJS. That would be the status quo, avoiding a breaking change. I’m only changing how the “runs only as ESM” files are interpreted, which turns a guaranteed error (Unexpected token import
) into running code.--experimental-default-type
today helps us out, because now I have a framework to use that defines how all files in various scopes are interpreted. The detection triggers either--experimental-default-type=module
or leaves us in the default--experimental-default-type=commonjs
, and that’s it.type
field, file extension).@nodejs/loaders @nodejs/tsc
The text was updated successfully, but these errors were encountered: