Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error when creating new experiment through UI: Experiment already exists. #13321

Closed
3 of 23 tasks
YashasviMantha opened this issue Oct 4, 2024 · 3 comments
Closed
3 of 23 tasks
Labels
area/tracking Tracking service, tracking client APIs, autologging area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server bug Something isn't working

Comments

@YashasviMantha
Copy link

Issues Policy acknowledgement

  • I have read and agree to submit bug reports in accordance with the issues policy

Where did you encounter this bug?

Other

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.

MLflow version

  • Client: N/A
  • Tracking server: 2.1.1

System information

Self Hosted MLflow instance: 2.1.1

Describe the problem

We recently migrated our existing MLflow instance to a new environment. While all the old experiments are present in the instance; we noticed that we are not able to add new experiments through the UI. Every time we try to add in a new experiment we get the following error on the UI:

RESOURCE_ALREADY_EXISTS: Experiment(name=test_test) already exists. Error: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "experiment_pk" DETAIL: Key (experiment_id)=(10) already exists. [SQL: INSERT INTO experiments (name, artifact_location, lifecycle_stage, creation_time, last_update_time) VALUES (%(name)s, %(artifact_location)s, %(lifecycle_stage)s, %(creation_time)s, %(last_update_time)s) RETURNING experiments.experiment_id] [parameters: {'name': 'test_test', 'artifact_location': '', 'lifecycle_stage': 'active', 'creation_time': 1727758206130, 'last_update_time': 1727758206130}] (Background on this error at: https://sqlalche.me/e/14/gkpj)

Looking through the postgres backend of the instance, we do see some experiments (old ones before the migration) in the experiments table.

Now as mentioned in the error above, the experiment_id=(10) already exists on every retry, the experiment_id increments. I assume there is some flag that we forgot to update or maybe change. Requesting help regarding the same.

Also, one dirty solution (might not work as well) is to keep adding new experiments until it can put in new experiments (we have around 100 of them as of now). But probably for last resort.

Tracking information

REPLACE_ME

Code to reproduce issue

RESOURCE_ALREADY_EXISTS: Experiment(name=test_test) already exists. Error: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "experiment_pk" DETAIL: Key (experiment_id)=(10) already exists. [SQL: INSERT INTO experiments (name, artifact_location, lifecycle_stage, creation_time, last_update_time) VALUES (%(name)s, %(artifact_location)s, %(lifecycle_stage)s, %(creation_time)s, %(last_update_time)s) RETURNING experiments.experiment_id] [parameters: {'name': 'test_test', 'artifact_location': '', 'lifecycle_stage': 'active', 'creation_time': 1727758206130, 'last_update_time': 1727758206130}] (Background on this error at: https://sqlalche.me/e/14/gkpj)

Stack trace

REPLACE_ME

Other info / logs

N/A

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations
@YashasviMantha YashasviMantha added the bug Something isn't working label Oct 4, 2024
@github-actions github-actions bot added area/tracking Tracking service, tracking client APIs, autologging area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server labels Oct 4, 2024
@harupy
Copy link
Member

harupy commented Oct 7, 2024

I think this is because the experiment is still present but marked as deleted.

@YashasviMantha
Copy link
Author

@harupy you are right!
I have 101 experiments on the table. And 1 - 17 are all deleted. But among 18-100 there are a couple deleted and mostly active. Any thoughts on how this can be solved?

Also, I am not sure why MLFlow is not creating a new entry for a new experiment.

@YashasviMantha
Copy link
Author

YashasviMantha commented Oct 7, 2024

Did some digging and found this index on the database side: experiments_experiment_id_seq. It has 3 values: (last_value: 20, log_cnt: 27, is_called: true). Updating this fixed the issue:

For everyone landing here:

SELECT setval('experiments_experiment_id_seq', 100, true); 

Were 100 should be the next experiment ID or the number of rows already present in the experiment table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracking Tracking service, tracking client APIs, autologging area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants