Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New deployer #1284

Merged
merged 7 commits into from
Sep 24, 2020
Merged

New deployer #1284

merged 7 commits into from
Sep 24, 2020

Conversation

escattone
Copy link
Contributor

@escattone escattone commented Sep 19, 2020

Part of #1106

Adds new update-lambda-functions sub-command and significantly refactors the upload sub-command.

(deployer-0wsXDrGt-py3.8) bash-3.2$(maws_profile)$ deployer --help
Usage: deployer [OPTIONS] COMMAND [ARGS]...

Options:
  --debug / --no-debug
  --dry-run             Show what would be done, but don't actually do it.
                        [default: False]

  --version             Show the version and exit.
  --help                Show this message and exit.

Commands:
  update-lambda-functions
  upload

I still need to update the deployer README.

Does a full upload of the current production build of https://github.com/mdn/content as well as its redirects in about 4.5 minutes:

(deployer-0wsXDrGt-py3.8) bash-3.2$(maws_profile)$ deployer upload ../client/build/ --folder test
Warning: No content-translated-root has been specified.

Deployer (0.3.0)
Upload files from: ../client/build
Upload redirects from: /Users/rjohnson/repos/content/files
Upload into: test/ of mdn-content-dev
Total pending redirect uploads: 15,981 (18.9ms)
Total pending file uploads: 24,705 (554.4ms)
Total existing S3 objects: 0 (216.4ms)
  [▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋]  40686/40686  100%
Total uploaded files: 24,705 (1066.0MB)
Total uploaded redirects: 15,981
Total skipped files: 0 matched existing S3 objects
Total upload/skip time: 4m23s
Done in 4m24s.

Example document:
https://test.content.dev.mdn.mozit.cloud/en-us/docs/web/css
Example redirect:
https://test.content.dev.mdn.mozit.cloud/en-us/docs/-moz-locale-dir(ltr)

Copy link
Contributor

@peterbe peterbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not done yet. Just dropping some early feedback.

By the way. It works!!!
I don't know what isn't working yet because all the files and redirects are in the bucket.

"--name",
default=DEFAULT_NAME,
help=f"Name of the site (default {DEFAULT_NAME_PATTERN!r})",
help='Name of the S3 bucket or one of "dev", "stage", or "prod"',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense to me. AWS S3 bucket names are supposed to be globally unique. These suggestions there don't make sense then because you'd never get away with calling an S3 bucket "dev".

As a matter of fact, I see no reason to have a default here. (I believe the default is DEFAULT_BUCKET_NAME = config("DEPLOYER_BUCKET_NAME", default="dev"))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dev, stage, and prod options are bucket nicknames or shortcuts for mdn-content-dev, mdn-content-stage, and mdn-content-prod (they're converted to the real bucket name in upload.py). Just a convenience since they're the names we'll be using 99% of the time. I'll change the code to indicate that these are special nicknames.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this and agreed that it's a bit too magical to use nick names. Instead, let's keep it dumb and spell out the whole bucket name in full.

deployer/src/deployer/main.py Outdated Show resolved Hide resolved
deployer/src/deployer/main.py Outdated Show resolved Hide resolved
dirpath = Path(directory)

if not dirpath.exists():
raise click.ClickException(f"{directory} does not exist")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of comments, and because I wanted to back up what I was about to say, I put this together: https://github.com/escattone/yari/pull/43

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the cool validation tips! I incorporated all of your suggestions except for:

for fp in content_root.glob("**/_redirects.txt"):

I had originally used that, but it was significantly slower than

            for fp in iterdir(content_root, max_depth=1):
                if fp.name != "_redirects.txt":
                    continue

Since the redirect files are always under the locale, I could use (note the one star):

for fp in content_root.glob("*/_redirects.txt"):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slower, but can you even notice the difference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. I didn't know for fp in content_root.glob("*/_redirects.txt"): was a thing even. But that actually makes sense.
But this is all nits in a sense since your existing ad-hoc solution works perfectly fine.

deployer/src/deployer/upload.py Show resolved Hide resolved
deployer/src/deployer/upload.py Outdated Show resolved Hide resolved
@property
def etag(self):
"""
Calculates and returns a value equivalent to the file's AWS ETag value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does make me a bit nervous. Is the ETag creation a public thing in S3?
If our implementation ever diverges from what S3's public API exposes, we'll potentially be uploading what hasn't actually changed.

Copy link
Contributor Author

@escattone escattone Sep 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear you, but in practice they probably will never (or at least very rarely) change it, and it seems better to take the risk and not waste the time calculating/storing/fetching our own for each file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nobody has pushed almost any changes to Yari's source code in a 24h span, 99% of the Is-it-in-S3-already-or-shall-I-bother-to-upload it will be 99% times benefitting from comparing the ETag since the filename and filesize (both from list_objects_v2) will be identical. AND in the cases in the cases where the only thing changed in the 24h span is a line of SCSS or .tsx or a JS package, then again this code will be a godsend since we'll get the same filename and same filesize but a different ETag value.
So, having it, despite its "risks" is a good idea.

deployer/src/deployer/upload.py Outdated Show resolved Hide resolved
@escattone escattone marked this pull request as ready for review September 22, 2020 21:33
for line_num, line in enumerate(fp.read_text().split("\n"), start=1):
line = line.strip()
if line and not line.startswith("#"):
parts = line.lower().split("\t")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parts = line.lower().split("\t")
parts = line.split("\t")

raise Exception(
f"Unable to parse {fp}:{line_num} into a from/to URL pair"
)
from_url, to_url = parts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from_url, to_url = parts
from_url, to_url = parts
from_url = from_url.lower()

"node": ">=12.x"
},
"aws": {
"name": "peterbe-yari-content-origin-request",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"name": "peterbe-yari-content-origin-request",
"name": "mdn-content-origin-request",

Copy link
Contributor

@peterbe peterbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • One nit about the default aws.name

  • Make sure the redirect doesn't lowercase the to_key

  • Rename the aws_lambda.py file to update_lambda_functions.py to match the function name in main.py

  • It should freak out if no _redirects.txt are found in the (translated)_content_root. ✅

  • As discussed, always use a prefix and have it default to main in the origin request business logic. ✅

@escattone
Copy link
Contributor Author

@peterbe Thanks for the really helpful feedback not only in your comments but in our meetings earlier today as well.

@escattone escattone merged commit 34c58de into mdn:master Sep 24, 2020
@escattone escattone deleted the new-deployer branch September 24, 2020 00:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants