-
-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐕 Batch: Refactoring Test workflows in models #1484
Comments
Hi @Abellegese or @DhanshreeA Can you clarify if the test command needs to be modified (according to point 1) or both the test command and the playground will be modified? |
Hey @GemmaTuron our plan is to update both pipeline for this functionality. I am creating one issue for both. |
A few more detail about the features has been given here #1488. |
Summary
This issue will encompass efforts to reconcile, clean up, and enhance our test (and build) pipelines for individual models.
We currently have a
test module
and CLI command (ersilia test ...
) that can check a given model for functionality, completeness, and correctness. In addition to this, we also have a testing playground - a test utility which checks a given model for functionality, completeness, and correctness; and is able to simulate running one or more models on a user's system.Existing test in our model pipeline is quite redundant in face of these functionalities because it is quite naive in comparison since it only tests nullity in model predictions, and is not robust to how a model might serialize its outputs. Moreover, the Docker build pipelines are bloated with code that can be removed in favor of a singular workflow testing the built images. We also need to handle testing for ARM and AMD builds more smartly since currently we only test the AMD images, but recently we have experienced some models successfully building for the ARM platform but then not actually working.
Furthermore, we need to revisit H5 serialization within Ersilia, and also include tests for this functionality at the level of testing models.
Each of the objectives below should be considered individual tasks, and should be addressed in separate PRs referencing this issue.
Objective(s)
test-model.yml
workflow, we should remove the current testing logic (L128-L144) and keep it in favor of only using theersilia test
command. We also want to upload the logs generated from this command, as well as the results of this command as artifacts with a retention period of 14 days.test-model-pr.yml
workflow, we should only keep to using theersilia test
command. Same conditions apply for handling and uploading the logs and results as artifacts with retention of 14 days.upload-ersilia-pack.yml
, andupload-bentoml.yml
workflows to only build and publish model images (both for ARM and AMD), ie we can remove the testing logic from these workflows. These images should be taggeddev
.Upload model to DockerHub
workflow. This workflow should utilise the Testing Playground utility from Ersilia and test the built model image (however it gets built, ie using Ersilia Pack or legacy approaches). This workflow should run on a matrix ofubuntu-latest
, andmacos-latest
, to ensure that we are also testing the AMD images. Based on the result of this workflows, we can tag the images latest and identify which architectures they successfully work on.Post model upload
workflow should run at the very last and update necessary metadata stores (Airtable, S3 JSON), and README. We can remove the step to create testing issues for community members at this point from this workflow.Documentation
ModelTester
class used in the test CLI command: https://ersilia.gitbook.io/ersilia-book/ersilia-model-hub/developer-docs/model-testerTesting Playground
utility: https://ersilia.gitbook.io/ersilia-book/ersilia-model-hub/developer-docs/testing-playgroundThe text was updated successfully, but these errors were encountered: