🐕 Batch: Refactoring Test workflows in models #1484

DhanshreeA · 2025-01-03T10:44:15Z

Summary

This issue will encompass efforts to reconcile, clean up, and enhance our test (and build) pipelines for individual models.

We currently have a test module and CLI command (ersilia test ...) that can check a given model for functionality, completeness, and correctness. In addition to this, we also have a testing playground - a test utility which checks a given model for functionality, completeness, and correctness; and is able to simulate running one or more models on a user's system.

Existing test in our model pipeline is quite redundant in face of these functionalities because it is quite naive in comparison since it only tests nullity in model predictions, and is not robust to how a model might serialize its outputs. Moreover, the Docker build pipelines are bloated with code that can be removed in favor of a singular workflow testing the built images. We also need to handle testing for ARM and AMD builds more smartly since currently we only test the AMD images, but recently we have experienced some models successfully building for the ARM platform but then not actually working.

Furthermore, we need to revisit H5 serialization within Ersilia, and also include tests for this functionality at the level of testing models.

Each of the objectives below should be considered individual tasks, and should be addressed in separate PRs referencing this issue.

Objective(s)

Consolidate the following input-output combinations in the testing scenarios covered by the ersilia test command:

Input = CSV - Output = CSV
Input = CSV - Output = HDF5
Input = CSV - Output = JSON
Input = SMILE - Output = CSV
Input = SMILE - Output = HDF5
Input = SMILE - Output = JSON

For the test-model.yml workflow, we should remove the current testing logic (L128-L144) and keep it in favor of only using the ersilia test command. We also want to upload the logs generated from this command, as well as the results of this command as artifacts with a retention period of 14 days.
Same as above, in the test-model-pr.yml workflow, we should only keep to using the ersilia test command. Same conditions apply for handling and uploading the logs and results as artifacts with retention of 14 days.
Refactor the upload-ersilia-pack.yml, and upload-bentoml.yml workflows to only build and publish model images (both for ARM and AMD), ie we can remove the testing logic from these workflows. These images should be tagged dev.
Refactor the testing playground to work with specific model ids, as well as image tags.
Create a new test workflow for docker builds that is triggered after the Upload model to DockerHub workflow. This workflow should utilise the Testing Playground utility from Ersilia and test the built model image (however it gets built, ie using Ersilia Pack or legacy approaches). This workflow should run on a matrix of ubuntu-latest, and macos-latest, to ensure that we are also testing the AMD images. Based on the result of this workflows, we can tag the images latest and identify which architectures they successfully work on.
The Post model upload workflow should run at the very last and update necessary metadata stores (Airtable, S3 JSON), and README. We can remove the step to create testing issues for community members at this point from this workflow.

Documentation

ModelTester class used in the test CLI command: https://ersilia.gitbook.io/ersilia-book/ersilia-model-hub/developer-docs/model-tester
Testing Playground utility: https://ersilia.gitbook.io/ersilia-book/ersilia-model-hub/developer-docs/testing-playground

The text was updated successfully, but these errors were encountered:

GemmaTuron · 2025-01-03T16:13:11Z

Hi @Abellegese or @DhanshreeA

Can you clarify if the test command needs to be modified (according to point 1) or both the test command and the playground will be modified?
The test command only tests the model from source, right? And the only modification we will do to it currently is to test all different combinations of input and output which currently was not happening? Once an output is generated, whichever the format, the next step is to check that the output has the required length, is not none etc?
What are the modifications to do in the testing playground more specifically? Maybe opening one issue with more details for each task would be helpful as those get tackled.
I would also add that Documenting in GitBook is an important part of each task

Abellegese · 2025-01-03T17:23:58Z

Hey @GemmaTuron our plan is to update both pipeline for this functionality. I am creating one issue for both.

Abellegese · 2025-01-05T13:41:43Z

A few more detail about the features has been given here #1488.

github-project-automation bot added this to Ersilia Model Hub Jan 3, 2025

github-project-automation bot moved this to On Hold in Ersilia Model Hub Jan 3, 2025

DhanshreeA removed the status in Ersilia Model Hub Jan 3, 2025

Abellegese self-assigned this Jan 3, 2025

GemmaTuron changed the title ~~🐕 Batch: Refactoring Test workfllows in models~~ 🐕 Batch: Refactoring Test workflows in models Jan 3, 2025

This was referenced Jan 3, 2025

📑 Feature Request: Extensions for updated test script #1228

Open

🐈 Task: Run model repo syncing workflow in a cron job #1466

Open

Abellegese moved this to In Progress in Ersilia Model Hub Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐕 Batch: Refactoring Test workflows in models #1484

🐕 Batch: Refactoring Test workflows in models #1484

DhanshreeA commented Jan 3, 2025 •

edited by Abellegese

Loading

GemmaTuron commented Jan 3, 2025

Abellegese commented Jan 3, 2025

Abellegese commented Jan 5, 2025

🐕 Batch: Refactoring Test workflows in models #1484

🐕 Batch: Refactoring Test workflows in models #1484

Comments

DhanshreeA commented Jan 3, 2025 • edited by Abellegese Loading

Summary

Objective(s)

Documentation

GemmaTuron commented Jan 3, 2025

Abellegese commented Jan 3, 2025

Abellegese commented Jan 5, 2025

DhanshreeA commented Jan 3, 2025 •

edited by Abellegese

Loading