-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: ADVI
- arguments start
and start_sigma
have inconsistent keys for transformed random variables
#7534
Comments
] |
Thanks for this. Yeah, the start values should really be associated with |
cc @ferrine |
It's also problematic because some approximations might not have a very clear correspondence to variables: e.g. Normalizing Flows or other low rank approximations, the initialization parameters of ADVI just happen to coincide with model parameters. For every approximation I think there is a special way to initialize parameters. If we consider to rethink the API, not error messages, this should be taken into account |
@ferrine thanks for chiming in, would it make sense to ask the user to directly specify the parameters of the variational model ( I'd like to move forward with a less invasive solution that at least ensures that start and start_sigma are consistent in terms of the keys that are used. Also if possible I'd want an API that enables me to get the final values for the variational parameters from a fitted approximation and a clear recipe of turning them back into an initial guess for a new fit. |
Describe the issue:
There are a couple of issues with the current design:
i) The keys of
start
andstart_sigma
are inconsistent as can be seen in example below.ii) I can specify arbitrary (str-valued) keys for
start_sigma
, if they are not in the model, then the default zero initialisation is used for therho
variable in the variational approximation.iii) The
np.log(np.expm1(np.abs(sigma)))
transformation from sigma to rho in methodcreate_shared_parameters
from classMeanFieldGroup
is also somewhat surprising.iv) It is also unclear how the values generated from
advi.approx.mean.eval
andadvi.approx.std.eval
relate to the variational parametersmu
andrho
returned by methodcreate_shared_params
and/or with the free random variablesbeta
andsigma
of the model. For example I think that tracker["mean"][-1][-1] (cf. example below) is the variational parameter corresponding to the variablesigma_log__
.v) In
pm.fit
argumentsstart
andstart_sigma
are ignored if an instance of ADVI is passed as method.Reproduceable code example:
This is from class
MeanFieldGroup
I think this simple example is already hitting the problematic edge case mentioned in the NOTE. Also the line
sigma = start_sigma.get(name)
is problematic as passingstart_sigma
with wrong keys will never raise an error.This is in contrast to parameter
start
where a wrong key will raise, e.g. using "beta_foo" instead of "beta"Also this block in
pm.fit
is problematic since it will ignorestart
andstart_sigma
if an instance of e.g. ADVI is passed asmethod
argument to instead of the string 'advi'Initially this made debugging of the actual issue even more contrived. Also it is not clear if I can use any instance of ADVI to get tracking for
mean
andstd
or if it has to be the same instance that is passed topm.fit
.I'd be happy to make a PR with a fix, but I am lacking familiarity with the APIs/concepts mentioned in the NOTE, i.e.
I'd be super glad to receive guidance so that I can work on this. I think that a fix could significantly improve the API and usability of the initialisation for ADVI.
Context for the issue:
cf. https://discourse.pymc.io/t/variational-fit-advi-initialisation/18630/4 for more context. Also thanks a lot @jessegrabowski for already very helpful suggestions.
A working example, that demonstrates the quirks of the current state is below. The example demonstrates how to use the "final state" (variational parameters mu, rho???) to initialise a fit for a model with slightly enlarged data set, but I wouldn't know how to implement this for a complicated (hierarchical) model with many (transformed) random variables. As the transformation need to be applied in order to use the "final state" from the first fit as initialisation of the second fit.
This is the trace for the initial ADVI fit
and here for the model with data updated and parameters initialised
As we can see we can save quite a lot of computation by using the final parameters from the old fit as the initial guess for the new fit.
Error message:
No response
PyMC version information:
The text was updated successfully, but these errors were encountered: