Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade_catalog_perms implementation is problematic for tables with a lot of data #29801

Closed
3 tasks done
michael-s-molina opened this issue Jul 31, 2024 · 2 comments · Fixed by #29860
Closed
3 tasks done
Assignees
Labels
#bug:performance Performance bugs change:backend Requires changing the backend

Comments

@michael-s-molina
Copy link
Member

michael-s-molina commented Jul 31, 2024

Bug description

#28394 introduced the upgrade_catalog_perms which makes a full scan of the query table which can contains millions of rows in some organizations. There are two main problems:
1 - We query for the id and database_id for all query rows to later set the catalog for each row instead of just submitting an UPDATE statement directly.
2 - This is function is invoked from different migrations which executes the full scan of the query table multiple times. Ideally, we would try to merge these migrations given how critical the query table is.

How to reproduce the bug

Use superset db upgrade to run any of the following migrations which use upgrade_catalog_perms:

2024-05-01_10-52_58d051681a3b_add_catalog_perm_to_tables.py

2024-05-08_19-33_4081be5b6b74_enable_catalog_in_databricks.py

2024-05-09_18-44_87ffc36f9842_enable_catalog_in_bigquery_presto_trino_.py

Screenshots/recordings

No response

Superset version

master / latest-dev

Python version

3.9

Node version

16

Browser

Chrome

Additional context

No response

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
@dosubot dosubot bot added #bug:performance Performance bugs change:backend Requires changing the backend labels Jul 31, 2024
@michael-s-molina
Copy link
Member Author

@mistercrunch @betodealmeida @sadpandajoe @eschutho Do you think it would be possible to merge the migrations that touch the query table?

Any other ideas on how to improve this migration? At Airbnb we have millions of rows which requires significant downtime.

@michael-s-molina
Copy link
Member Author

I'm working on an updated version of upgrade_catalog_perms to support millions of rows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
#bug:performance Performance bugs change:backend Requires changing the backend
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants