-
Notifications
You must be signed in to change notification settings - Fork 513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New deployer #1284
New deployer #1284
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not done yet. Just dropping some early feedback.
By the way. It works!!!
I don't know what isn't working yet because all the files and redirects are in the bucket.
deployer/src/deployer/main.py
Outdated
"--name", | ||
default=DEFAULT_NAME, | ||
help=f"Name of the site (default {DEFAULT_NAME_PATTERN!r})", | ||
help='Name of the S3 bucket or one of "dev", "stage", or "prod"', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make sense to me. AWS S3 bucket names are supposed to be globally unique. These suggestions there don't make sense then because you'd never get away with calling an S3 bucket "dev".
As a matter of fact, I see no reason to have a default here. (I believe the default is DEFAULT_BUCKET_NAME = config("DEPLOYER_BUCKET_NAME", default="dev")
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dev
, stage
, and prod
options are bucket nicknames or shortcuts for mdn-content-dev
, mdn-content-stage
, and mdn-content-prod
(they're converted to the real bucket name in upload.py
). Just a convenience since they're the names we'll be using 99% of the time. I'll change the code to indicate that these are special nicknames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this and agreed that it's a bit too magical to use nick names. Instead, let's keep it dumb and spell out the whole bucket name in full.
deployer/src/deployer/main.py
Outdated
dirpath = Path(directory) | ||
|
||
if not dirpath.exists(): | ||
raise click.ClickException(f"{directory} does not exist") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of comments, and because I wanted to back up what I was about to say, I put this together: https://github.com/escattone/yari/pull/43
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the cool validation tips! I incorporated all of your suggestions except for:
for fp in content_root.glob("**/_redirects.txt"):
I had originally used that, but it was significantly slower than
for fp in iterdir(content_root, max_depth=1):
if fp.name != "_redirects.txt":
continue
Since the redirect files are always under the locale, I could use (note the one star):
for fp in content_root.glob("*/_redirects.txt"):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slower, but can you even notice the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. I didn't know for fp in content_root.glob("*/_redirects.txt"):
was a thing even. But that actually makes sense.
But this is all nits in a sense since your existing ad-hoc solution works perfectly fine.
@property | ||
def etag(self): | ||
""" | ||
Calculates and returns a value equivalent to the file's AWS ETag value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does make me a bit nervous. Is the ETag creation a public thing in S3?
If our implementation ever diverges from what S3's public API exposes, we'll potentially be uploading what hasn't actually changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear you, but in practice they probably will never (or at least very rarely) change it, and it seems better to take the risk and not waste the time calculating/storing/fetching our own for each file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If nobody has pushed almost any changes to Yari's source code in a 24h span, 99% of the Is-it-in-S3-already-or-shall-I-bother-to-upload it will be 99% times benefitting from comparing the ETag since the filename and filesize (both from list_objects_v2
) will be identical. AND in the cases in the cases where the only thing changed in the 24h span is a line of SCSS or .tsx or a JS package, then again this code will be a godsend since we'll get the same filename and same filesize but a different ETag value.
So, having it, despite its "risks" is a good idea.
deployer/src/deployer/upload.py
Outdated
for line_num, line in enumerate(fp.read_text().split("\n"), start=1): | ||
line = line.strip() | ||
if line and not line.startswith("#"): | ||
parts = line.lower().split("\t") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parts = line.lower().split("\t") | |
parts = line.split("\t") |
raise Exception( | ||
f"Unable to parse {fp}:{line_num} into a from/to URL pair" | ||
) | ||
from_url, to_url = parts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from_url, to_url = parts | |
from_url, to_url = parts | |
from_url = from_url.lower() |
Feedback on upload aws stuff
"node": ">=12.x" | ||
}, | ||
"aws": { | ||
"name": "peterbe-yari-content-origin-request", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"name": "peterbe-yari-content-origin-request", | |
"name": "mdn-content-origin-request", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
One nit about the default
aws.name
✅ -
Make sure the redirect doesn't lowercase the
to_key
✅ -
Rename the
aws_lambda.py
file toupdate_lambda_functions.py
to match the function name inmain.py
✅ -
It should freak out if no
_redirects.txt
are found in the(translated)_content_root
. ✅ -
As discussed, always use a prefix and have it default to
main
in the origin request business logic. ✅
@peterbe Thanks for the really helpful feedback not only in your comments but in our meetings earlier today as well. |
Part of #1106
Adds new
update-lambda-functions
sub-command and significantly refactors theupload
sub-command.I still need to update the deployer README.✅Does a full upload of the current production build of https://github.com/mdn/content as well as its redirects in about 4.5 minutes:
Example document:
https://test.content.dev.mdn.mozit.cloud/en-us/docs/web/css
Example redirect:
https://test.content.dev.mdn.mozit.cloud/en-us/docs/-moz-locale-dir(ltr)