Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: s3 error at startup loses all compilers #6689

Open
mattgodbolt opened this issue Jul 5, 2024 · 0 comments
Open

[BUG]: s3 error at startup loses all compilers #6689

mattgodbolt opened this issue Jul 5, 2024 · 0 comments
Labels

Comments

@mattgodbolt
Copy link
Member

Describe the bug

A node got an exception at startup and then 404d on all compilers https://my.papertrailapp.com/systems/ip-172-30-0-244/events?focus=1745854003808639953&selected=1745854003808639953

Jul 05 12:07:14Z ip-172-30-0-244 amazon info: Compilers created: 3463 
Jul 05 12:07:14Z ip-172-30-0-244 amazon info: Fetching possible arguments from storage 
Jul 05 12:07:18Z ip-172-30-0-244 compiler-explorer @smithy/node-http-handler:WARN socket usage at capacity=50 and 3413 additional requests are enqueued. See https://docs.aws.amazon.com/sdk-for-javascript/v3/developer-guide/node-configuring-maxsockets.html or increase socketAcquisitionWarningTimeout=(millis) in the NodeHttpHandler config.
Jul 05 12:07:20Z ip-172-30-0-244 all.err 2024/07/05 12:07:20 [error] 505#505: *81 connect() failed (111: Unknown error) while connecting to upstream, client: 172.30.2.100, server: , request: "GET /healthcheck HTTP/1.1", upstream: "http://127.0.0.1:10240/healthcheck", host: "172.30.0.244"
Jul 05 12:07:20Z ip-172-30-0-244 all.err 2024/07/05 12:07:20 [error] 505#505: *83 connect() failed (111: Unknown error) while connecting to upstream, client: 172.30.5.104, server: , request: "GET /healthcheck HTTP/1.1", upstream: "http://127.0.0.1:10240/healthcheck", host: "172.30.0.244"
Jul 05 12:07:20Z ip-172-30-0-244 all.err 2024/07/05 12:07:20 [error] 505#505: *85 connect() failed (111: Unknown error) while connecting to upstream, client: 172.30.1.13, server: , request: "GET /healthcheck HTTP/1.1", upstream: "http://127.0.0.1:10240/healthcheck", host: "172.30.0.244"
Jul 05 12:07:20Z ip-172-30-0-244 all.err 2024/07/05 12:07:20 [error] 505#505: *87 connect() failed (111: Unknown error) while connecting to upstream, client: 172.30.4.113, server: , request: "GET /healthcheck HTTP/1.1", upstream: "http://127.0.0.1:10240/healthcheck", host: "172.30.0.244"
Jul 05 12:07:20Z ip-172-30-0-244 all.err 2024/07/05 12:07:20 [error] 505#505: *89 connect() failed (111: Unknown error) while connecting to upstream, client: 172.30.0.97, server: , request: "GET /healthcheck HTTP/1.1", upstream: "http://127.0.0.1:10240/healthcheck", host: "172.30.0.244"
Jul 05 12:07:30Z ip-172-30-0-244 all.err 2024/07/05 12:07:30 [error] 505#505: *91 connect() failed (111: Unknown error) while connecting to upstream, client: 172.30.2.100, server: , request: "GET /healthcheck HTTP/1.1", upstream: "http://127.0.0.1:10240/healthcheck", host: "172.30.0.244"
Jul 05 12:07:30Z ip-172-30-0-244 all.err 2024/07/05 12:07:30 [error] 505#505: *93 connect() failed (111: Unknown error) while connecting to upstream, client: 172.30.5.104, server: , request: "GET /healthcheck HTTP/1.1", upstream: "http://127.0.0.1:10240/healthcheck", host: "172.30.0.244"
Jul 05 12:07:30Z ip-172-30-0-244 all.err 2024/07/05 12:07:30 [error] 505#505: *95 connect() failed (111: Unknown error) while connecting to upstream, client: 172.30.4.113, server: , request: "GET /healthcheck HTTP/1.1", upstream: "http://127.0.0.1:10240/healthcheck", host: "172.30.0.244"
Jul 05 12:07:30Z ip-172-30-0-244 all.err 2024/07/05 12:07:30 [error] 505#505: *97 connect() failed (111: Unknown error) while connecting to upstream, client: 172.30.1.13, server: , request: "GET /healthcheck HTTP/1.1", upstream: "http://127.0.0.1:10240/healthcheck", host: "172.30.0.244"
Jul 05 12:07:30Z ip-172-30-0-244 all.err 2024/07/05 12:07:30 [error] 505#505: *99 connect() failed (111: Unknown error) while connecting to upstream, client: 172.30.0.97, server: , request: "GET /healthcheck HTTP/1.1", upstream: "http://127.0.0.1:10240/healthcheck", host: "172.30.0.244"
Jul 05 12:07:33Z ip-172-30-0-244 amazon  error: We encountered an internal error. Please try again. {"$fault":"client","$metadata":{"attempts":3,"extendedRequestId":"1N3Wvs3Ah6/ pO0a0gzSTppdbWiEfKqsLkkP ZVKXFJ4 XP/cF6rP1MQle2nuU6yDQnp7uVMBM8=","httpStatusCode":500,"requestId":"3KPPHE8PZTJM3TQP","totalRetryDelay":14},"Code":"InternalError","HostId":"1N3Wvs3Ah6/ pO0a0gzSTppdbWiEfKqsLkkP ZVKXFJ4 XP/cF6rP1MQle2nuU6yDQnp7uVMBM8=","RequestId":"3KPPHE8PZTJM3TQP","name":"InternalError","stack":"InternalError: We encountered an internal error. Please try again.\n    at throwDefaultError (/infra/.deploy/node_modules/@smithy/smithy-client/dist-cjs/index.js:838:20)\n    at /infra/.deploy/node_modules/@smithy/smithy-client/dist-cjs/index.js:847:5\n    at de_CommandError (/infra/.deploy/node_modules/@aws-sdk/client-s3/dist-cjs/index.js:4756:14)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async /infra/.deploy/node_modules/@smithy/middleware-serde/dist-cjs/index.js:35:20\n    at async /infra/.deploy/node_modules/@aws-sdk/middleware-signing/dist-cjs/index.js:226:18\n    at async /infra/.deploy/node_modules/@smithy/middleware-retry/dist-cjs/index.js:320:38\n    at async /infra/.deploy/node_modules/@aws-sdk/middleware-flexible-checksums/dist-cjs/index.js:174:18\n    at async /infra/.deploy/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/index.js:98:20\n    at async /infra/.deploy/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/index.js:121:14"} 
Jul 05 12:07:35Z ip-172-30-0-244 amazon info: Compiler scan count: 3463 
Jul 05 12:07:35Z ip-172-30-0-244 amazon info: Fetching remote libraries from https://godbolt.org:443/api/libraries/c   
Jul 05 12:07:35Z ip-172-30-0-244 amazon info: Fetching remote libraries from https://godbolt.org:443/api/libraries/c 
Jul 05 12:07:35Z ip-172-30-0-244 amazon info: Fetching remote libraries from https://godbolt.org:443/api/libraries/cuda 
Jul 05 12:07:35Z ip-172-30-0-244 amazon info: OPTIONS HASH: d50d868cc62733ff500235ba2bf0b0461145144d919675249cd84a81aa8aa42e 
Jul 05 12:07:36Z ip-172-30-0-244 amazon info: Running metrics server on port 10241 
Jul 05 12:07:36Z ip-172-30-0-244 amazon info:   using static files from 'https://static.ce-cdn.net/' 
Jul 05 12:07:36Z ip-172-30-0-244 amazon info:   Listening on http://localhost:10240/ 
Jul 05 12:07:36Z ip-172-30-0-244 amazon info:   Startup duration: 37838ms 
Jul 05 12:07:36Z ip-172-30-0-244 amazon info: ======================================= 
Jul 05 12:07:53Z ip-172-30-0-244 amazon  warn: Unable to find compiler with lang c   for JSON request {"allowStoreCodeDebug":true,"bypassCache":0,"compiler":"vcpp_v19_30_VS17_0_x64","files":[],"lang":"c  ","options":{"compilerOptions":{"overrides":[],"produceCfg":false,"produceDevice":false,"produceGccDump":{},"produceIr":null,"produceOptInfo":false,"produceOptPipeline":null,"producePp":null},"executeParameters":{"args":"","stdin":""},"filters":{"binary":false,"binaryObject":false,"commentOnly":true,"debugCalls":false,"demangle":true,"directives":true,"execute":false,"intel":true,"labels":true,"libraryCode":true,"trim":false},"libraries":[],"tools":[],"userArguments":""},"source":"<removed>"} 
Jul 05 12:07:53Z ip-172-30-0-244 amazon warn: 111.220.24.0 "POST /api/compiler/vcpp_v19_30_VS17_0_x64/compile" 404 
Jul 05 12:07:53Z ip-172-30-0-244 amazon info: Fetching AWS credentials for us-east-1... 

...and every compiler 404d after that.

So maybe due to the failure to fetch possible arguments, something didn't get set (like the compilers weren't updated properly as the exception stopped it) but then the system stayed up anyway.

The 500 from S3 itself is a separate issue.

Steps to reproduce

uncertain

Expected behavior

Should either:

  • die at startup under these situations
  • not error
  • be detected as a broken machine and recycled

Reproduction link

Not applicable

Screenshots

Not applicable

Operating System

No response

Browser version

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant