-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize CI calculation #380
Comments
Thank you very much for the suggestion! There are different places where parallel code is already used in OpenMx. Of particular note, raw data analyses essentially divide up the rows into as many chunks as there are processors and calculates the likelihoods separately for each chunk then gathers them at the end. This level of parallel isn't as embarrassingly parallel as the CIs, but it does speed up both the initial fitting and. the subsequent model fits to find the CIs. It is also placed where calculations are likely to be most demanding - much more so than, e.g., fitting to covariance matrices and means. I think parallel CIs would have to lose this feature so as not to parallel things that are already in parallel deeper in the calculations. That might, however, be faster depending on number of CIs and number of processors, and perhaps we could organize the lower level parallel to switch off for higher level parallel application. Thanks again - we are interested in improving both the flexibility of the code and its performance. |
That makes a lot of sense. In my use case, we are analyzing covariance matrices and there was no indication that more than one core was used in the computation. Parallelization of CI calculation would be really useful because we could move the calculation to a server with 128 cores. Depending on how parallelization is implemented in OpenMX, you might not need to change the existing code that much because the parallel computing framework might take care of the potential problems in calling parallelized code from code that is already parallelized. |
That's frontend R code. The code actually relevant to parallel computing is going to be backend C code. You are correct, though, that parallelizing confidence limits (or confidence intervals, when using the Wu-Neale adjustment) would be a better use of multithreading in most cases, due to the coarser level of granularity.
Two questions... First, were you running OpenMx under Windows, or a CRAN build of OpenMx under macOS? Both of those cases lack multithreading support. Second, which optimizer were you using? SLSQP is supposed to know how to divide its computation of the gradient elements among multiple threads. |
A bit of background: I have not really used OpenMX for years myself, but this was a question from a student. He is running OpenMX indirectly throug metaSEM package. When we had a meeting today, he asked why the CI calculation was so slow. We took a look at the code to figure it out. Now answering the questions:
The student used Windows and I use the CRAN version on mac. The 128 core server runs Linux.
We are using whatever is the default. I believe this to be SLSQP. |
Note that OpenMx is carrying out (at least) two numerical optimizations for every confidence interval requested. So, suppose you request a confidence interval for one parameter. OpenMx would then, at minimum, do three numerical searches at runtime: one to find the maximum-likelihood estimate, one to find the lower limit of the confidence interval, and one to find the upper limit of the confidence interval.
OK, then neither you nor the student had a build of OpenMx compiled for parallel computing. However, the Linux-powered server probably is running a multithreaded OpenMx build.
SLSQP is the on-load default, and is able to parallelize its computation of the objective function's gradient elements. Again, I agree that parallelizing over confidence intervals or confidence limits, rather than over gradient elements (or over subsets of the dataset, in the raw-data case) is a better use of parallel computing. We would like to implement it sometime in the future, but it is not a high priority at present. |
The code for calculating CIs runs very slow. This is noted in the documentation too. There is a potential for a dramatical speedup if the CI calculation was parellelized.
The relevant code is here
Instead of using a nested loop, the function should lapply, which could be parallelized using mclapply.
The text was updated successfully, but these errors were encountered: