Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] reduce pyspark test time #8324

Merged
merged 1 commit into from
Nov 21, 2022
Merged

[ci] reduce pyspark test time #8324

merged 1 commit into from
Nov 21, 2022

Conversation

wbo4958
Copy link
Contributor

@wbo4958 wbo4958 commented Oct 8, 2022

Without this PR, It takes 70s to finish test_classifier_with_cross_validator
70.03s call python/test_spark/test_spark_local.py::XgboostLocalTest::test_classifier_with_cross_validator
With this PR, it only takes 17s to be finished 17.43s call python/test_spark/test_spark_local.py::XgboostLocalTest::test_classifier_with_cross_validator

I also tried enabling pytest with parallelly running by pytest-xdist and it worked well with pyspark tests, and the time can be reduced from 4m49.335s to 2m19.574s.

I don't know if we need to enable pytest-xdist for python tests?

BTW, I also found some tests may have overlaps, like test_classifier_distributed_weight_eval and test_regressor_distributed_weight_eval, seems we don't need to test all kinds of estimators @WeichenXu123, please correct me.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the effect of xdist on other tests as xgboost already uses all the cores for some tests, which may lead to contention.

@RAMitchell
Copy link
Member

Thank you for working on this! It definitely looked like there was significant overlap in the tests and many could be combined to reduce the overhead.

It may be possible to use xdist just for spark tests, I suspect it will cause problems for other tests but it might be worth experimenting. Maybe we could set OMP_NUM_THREADS before running the tests. The main goal here is to reduce the time of the CI cpu python tests step, which currently stands at around 25mins.

@wbo4958 wbo4958 marked this pull request as ready for review November 18, 2022 02:20
@wbo4958
Copy link
Contributor Author

wbo4958 commented Nov 18, 2022

Yeah, Let's fix them one by one. Currently, this PR can reduce the most time-consuming test.

@wbo4958
Copy link
Contributor Author

wbo4958 commented Nov 21, 2022

@trivialfis please help to trigger the CI for the build. Thx

@hcho3
Copy link
Collaborator

hcho3 commented Nov 21, 2022

I started the CI

@trivialfis trivialfis merged commit 2dde65f into dmlc:master Nov 21, 2022
@NorthSummer
Copy link

Without this PR, It takes 70s to finish test_classifier_with_cross_validator 70.03s call python/test_spark/test_spark_local.py::XgboostLocalTest::test_classifier_with_cross_validator With this PR, it only takes 17s to be finished 17.43s call python/test_spark/test_spark_local.py::XgboostLocalTest::test_classifier_with_cross_validator

I also tried enabling pytest with parallelly running by pytest-xdist and it worked well with pyspark tests, and the time can be reduced from 4m49.335s to 2m19.574s.

I don't know if we need to enable pytest-xdist for python tests?

BTW, I also found some tests may have overlaps, like test_classifier_distributed_weight_eval and test_regressor_distributed_weight_eval, seems we don't need to test all kinds of estimators @WeichenXu123, please correct me.

Hi, could you please tell me how to count running time of the functions in the c codebase? Thanks!

@wbo4958 wbo4958 deleted the pyspark_ci branch March 22, 2023 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants