Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't do inverse metric decomposition every draw #2894

Open
wants to merge 26 commits into
base: develop
Choose a base branch
from

Conversation

bbbales2
Copy link
Member

@bbbales2 bbbales2 commented Mar 6, 2020

Fix for Issue #2881 .

This involved switching to setter/getters for interfacing with dense_e_point so I made the change for diag_e_point as well.

I also changed all the set_metric verbage to set_inv_metric.

I didn't know what tests needed added for this. Lemme know.

Submission Checklist

  • Run unit tests: ./runTests.py src/test/unit
  • Run cpplint: make cpplint
  • Declare copyright holder and open-source license: see below

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Columbia University

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

… once each time the inverse metric is set (instead of every sample, Issue #2881).

This involved switching to setter/getters for interfacing with dense_e_point so I made the change for diag_e_point as well.

Also changed set_metric verbage to set_inv_metric.
Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple quick suggestions and one Q!

src/stan/mcmc/hmc/hamiltonians/dense_e_point.hpp Outdated Show resolved Hide resolved
src/stan/mcmc/hmc/hamiltonians/dense_e_point.hpp Outdated Show resolved Hide resolved
src/stan/mcmc/hmc/hamiltonians/dense_e_point.hpp Outdated Show resolved Hide resolved
src/stan/mcmc/hmc/hamiltonians/diag_e_metric.hpp Outdated Show resolved Hide resolved
src/stan/mcmc/hmc/hamiltonians/diag_e_point.hpp Outdated Show resolved Hide resolved
src/stan/mcmc/hmc/nuts/adapt_dense_e_nuts.hpp Outdated Show resolved Hide resolved
@betanalpha
Copy link
Contributor

The current design is intentional.

The Cholesky is needed only once per transition which is a relatively small cost compared to the many gradient evaluations needed within each transition. Saving the Cholesky decomposition introduces an additional $\mathcal{O}(N^{2})$ memory burden which is much more than the $mathcal{O}(N)$ burden everywhere else, and becomes problematic for sufficiently large models. Without any explicit profiling demonstrating that the Cholesky is a substantial const for typical models I don't see any strong motivation for the change. I will add this point to the original issue.

At the same time the members of ps_point and its derivations are intentionally left public to avoid any code overhead for accessors and mutators. Because these classes are not exposed through the algorithm API and everything is public we decided that there isn't any need for encapsulation. If encapsulation were added then the accessors and mutators would need to be unit tested as is done elsewhere in the library.

Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Though I'd like to wait to merge till @betanalpha approves the test example is satisfying

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.19 3.21 0.99 -0.58% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.0 -0.23% slower
eight_schools/eight_schools.stan 0.12 0.09 1.27 21.05% faster
gp_regr/gp_regr.stan 0.17 0.18 0.99 -1.31% slower
irt_2pl/irt_2pl.stan 5.69 5.8 0.98 -1.89% slower
performance.compilation 91.35 88.51 1.03 3.11% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.46 9.02 0.94 -6.69% slower
pkpd/one_comp_mm_elim_abs.stan 29.44 29.83 0.99 -1.33% slower
sir/sir.stan 133.71 134.61 0.99 -0.67% slower
gp_regr/gen_gp_data.stan 0.04 0.05 0.98 -2.21% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.96 3.08 0.96 -3.97% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.4 0.41 0.97 -3.18% slower
arK/arK.stan 1.81 2.67 0.68 -47.53% slower
arma/arma.stan 0.63 0.86 0.74 -36.05% slower
garch/garch.stan 0.7 0.65 1.07 6.43% faster
Mean result: 0.971182373984

Jenkins Console Log
Blue Ocean
Commit hash: a67e9e5


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G :
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c /4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.12 3.17 0.98 -1.65% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.0 0.31% faster
eight_schools/eight_schools.stan 0.12 0.12 0.99 -1.13% slower
gp_regr/gp_regr.stan 0.17 0.17 1.01 0.54% faster
irt_2pl/irt_2pl.stan 5.7 5.7 1.0 -0.0% slower
performance.compilation 90.54 88.68 1.02 2.06% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.45 8.42 1.0 0.27% faster
pkpd/one_comp_mm_elim_abs.stan 30.54 28.12 1.09 7.91% faster
sir/sir.stan 128.64 131.17 0.98 -1.97% slower
gp_regr/gen_gp_data.stan 0.05 0.04 1.07 6.27% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.96 2.95 1.0 0.43% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.4 0.37 1.08 7.34% faster
arK/arK.stan 1.8 1.78 1.01 1.07% faster
arma/arma.stan 0.61 0.74 0.83 -20.48% slower
garch/garch.stan 0.67 0.55 1.22 17.77% faster
Mean result: 1.01859530072

Jenkins Console Log
Blue Ocean
Commit hash: a8e7500


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G :
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c /4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple little comments

src/stan/mcmc/hmc/hamiltonians/dense_e_point.hpp Outdated Show resolved Hide resolved
src/stan/mcmc/hmc/hamiltonians/diag_e_point.hpp Outdated Show resolved Hide resolved
src/stan/mcmc/hmc/hamiltonians/diag_e_metric.hpp Outdated Show resolved Hide resolved
@@ -31,10 31,15 @@ class adapt_diag_e_static_uniform
this->stepsize_adaptation_.learn_stepsize(this->nom_epsilon_,
s.accept_stat());

bool update = this->var_adaptation_.learn_variance(this->z_.inv_e_metric_,
this->z_.q);
Eigen::VectorXd inv_metric;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you know the size of this when you make it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could, but it gets resized in learn_variance. Should I still set the size when declaring it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you declare the sizes there then the resize is a no-op since it's already the correct size. But idt it's that big of a deal either way

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just left it off. It's kindof a weird thing happening here anyway (like learn_variance is effectively returning two things).

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.54 3.62 0.98 -2.16% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.05 4.57% faster
eight_schools/eight_schools.stan 0.12 0.12 1.0 0.18% faster
gp_regr/gp_regr.stan 0.17 0.17 0.98 -1.9% slower
irt_2pl/irt_2pl.stan 5.8 5.74 1.01 0.99% faster
performance.compilation 86.89 86.1 1.01 0.91% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.55 8.44 1.01 1.28% faster
pkpd/one_comp_mm_elim_abs.stan 30.82 29.5 1.04 4.27% faster
sir/sir.stan 129.2 135.34 0.95 -4.75% slower
gp_regr/gen_gp_data.stan 0.05 0.04 1.03 3.27% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.0 2.93 1.02 2.41% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.37 0.4 0.91 -9.98% slower
arK/arK.stan 2.47 1.8 1.37 27.04% faster
arma/arma.stan 0.6 0.6 1.0 -0.41% slower
garch/garch.stan 0.76 0.66 1.16 13.63% faster
Mean result: 1.03555431436

Jenkins Console Log
Blue Ocean
Commit hash: c4f30f6


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G :
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c /4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.46 3.41 1.01 1.43% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.99 -1.26% slower
eight_schools/eight_schools.stan 0.11 0.11 1.0 -0.5% slower
gp_regr/gp_regr.stan 0.15 0.15 1.01 1.42% faster
irt_2pl/irt_2pl.stan 5.24 5.29 0.99 -1.09% slower
performance.compilation 89.84 89.14 1.01 0.79% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.81 7.78 1.0 0.36% faster
pkpd/one_comp_mm_elim_abs.stan 29.7 28.98 1.02 2.43% faster
sir/sir.stan 129.8 131.11 0.99 -1.01% slower
gp_regr/gen_gp_data.stan 0.04 0.04 0.98 -2.28% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.01 3.0 1.0 0.27% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.39 0.39 1.0 0.15% faster
arK/arK.stan 2.48 1.8 1.38 27.42% faster
arma/arma.stan 0.72 0.61 1.17 14.77% faster
garch/garch.stan 0.65 0.59 1.09 8.49% faster
Mean result: 1.04354280135

Jenkins Console Log
Blue Ocean
Commit hash: c4f30f6


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G :
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c /4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@SteveBronder
Copy link
Collaborator

The Cholesky is needed only once per transition which is a relatively small cost compared to the many gradient evaluations needed within each transition. Saving the Cholesky decomposition introduces an additional $\mathcal{O}(N^{2})$ memory burden which is much more than the $mathcal{O}(N)$ burden everywhere else, and becomes problematic for sufficiently large models. Without any explicit profiling demonstrating that the Cholesky is a substantial const for typical models I don't see any strong motivation for the change. I will add this point to the original issue.

Thinking about this again, don't you pay the O(N^2) memory burden still computing the cholesky but it's just temporary data? If a program failed because of N is too large it would fail either way wouldn't it?

@bbbales2
Copy link
Member Author

bbbales2 commented Mar 9, 2021

don't you pay the O(N^2) memory burden still

Oh yeah, the temporary would still be O(N^2) since you compute the Cholesky at one point or another. Good point. I'll add the forwarding real quick

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.44 3.44 1.0 -0.11% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.97 -2.88% slower
eight_schools/eight_schools.stan 0.11 0.11 1.01 0.6% faster
gp_regr/gp_regr.stan 0.16 0.16 0.98 -2.41% slower
irt_2pl/irt_2pl.stan 5.23 5.34 0.98 -2.11% slower
performance.compilation 90.25 88.92 1.01 1.47% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.92 8.86 1.01 0.64% faster
pkpd/one_comp_mm_elim_abs.stan 29.33 30.56 0.96 -4.18% slower
sir/sir.stan 130.71 130.01 1.01 0.54% faster
gp_regr/gen_gp_data.stan 0.03 0.04 0.91 -10.4% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.12 3.12 1.0 -0.03% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.37 0.39 0.96 -4.28% slower
arK/arK.stan 1.9 1.88 1.01 0.88% faster
arma/arma.stan 0.63 0.95 0.67 -49.87% slower
garch/garch.stan 0.53 0.64 0.83 -21.14% slower
Mean result: 0.952369574099

Jenkins Console Log
Blue Ocean
Commit hash: 7072224


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G :
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c /4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.36 3.38 1.0 -0.49% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.99 -1.25% slower
eight_schools/eight_schools.stan 0.12 0.12 1.01 1.32% faster
gp_regr/gp_regr.stan 0.16 0.16 1.01 0.54% faster
irt_2pl/irt_2pl.stan 6.03 5.95 1.01 1.22% faster
performance.compilation 91.54 89.45 1.02 2.28% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 9.53 8.6 1.11 9.67% faster
pkpd/one_comp_mm_elim_abs.stan 28.75 30.51 0.94 -6.13% slower
sir/sir.stan 122.72 125.39 0.98 -2.18% slower
gp_regr/gen_gp_data.stan 0.03 0.03 1.0 -0.44% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.03 3.05 1.0 -0.48% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.4 0.38 1.03 2.62% faster
arK/arK.stan 1.87 2.52 0.74 -34.96% slower
arma/arma.stan 0.87 0.78 1.11 9.59% faster
garch/garch.stan 0.62 0.73 0.85 -17.14% slower
Mean result: 0.985575872844

Jenkins Console Log
Blue Ocean
Commit hash: 9093b0f


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G :
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c /4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants