Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/car #322

Open
wants to merge 69 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
69 commits
Select commit Hold shift click to select a range
33ea96b
adding stan_CARbym; print/summary methods work
imadmali Jul 10, 2017
6feee4f
get important post-estimation methods/functions working
imadmali Jul 11, 2017
6ad17ba
adding stan_bym (inla version)
imadmali Jul 13, 2017
c2e2212
adding stan_besag (icar)
imadmali Jul 13, 2017
5b3fda8
fixing conditional in stanreg for car models
imadmali Jul 13, 2017
55fb6c6
expanding prior options for CAR models
imadmali Jul 21, 2017
6247da9
changes to spatial.fit file
imadmali Jul 25, 2017
f6f853d
Merge branch 'master' into feature/car
imadmali Aug 4, 2017
e3ea475
adding linkinv chunks to spatial.stan
imadmali Aug 4, 2017
4c0f42d
using chunk functions for meanPPD
imadmali Aug 4, 2017
2070bb4
demeaning, QR decom, and posterior methods work
imadmali Aug 6, 2017
fcc8d71
get prior summary working for CAR models
imadmali Aug 7, 2017
34f03e6
adding documentation for CAR models
imadmali Aug 7, 2017
1869c9c
adding stan_besag tests; CAR models return 'terms'
imadmali Aug 7, 2017
44f7e4a
adding vb methods for CAR models; optimizing currently fails
imadmali Aug 8, 2017
8746893
adding tests for stan_bym
imadmali Aug 8, 2017
c329a6a
use sparse matrix ops for 'scaling_factor'
imadmali Aug 8, 2017
0ebd45c
adding light documentation for stan_spatial.fit
imadmali Aug 8, 2017
9859ddb
omit optimizing from supported algorithms in CAR models
imadmali Aug 8, 2017
7c36fef
use chunks stan code; use prior_aux
imadmali Aug 9, 2017
fbdeeeb
adding more likelihoods to CAR models
imadmali Aug 9, 2017
11c132e
refactor stan chunks for spatial.stan
imadmali Aug 10, 2017
aeb898b
draft of CAR vignette; fixing binomial lpmf bug in spatial.stan
imadmali Aug 11, 2017
81cf7f6
updating documentation/vignette for CAR models
imadmali Aug 13, 2017
b65253c
adding 2nd order spatial random walks
imadmali Aug 13, 2017
a039e2a
Q should be squared if 'order == 2'
imadmali Aug 13, 2017
40b248b
update stan_besag tests with inla comparisons
imadmali Aug 13, 2017
b846069
changing stan_bym to stan_bym2 (INLA convention)
imadmali Aug 13, 2017
9aafe0b
adding BYM model; changing spatial parameter names
imadmali Aug 14, 2017
3582ca6
change parameterization in spatial.stan from prec to sd
imadmali Aug 14, 2017
9cc2a37
convert spatial weight matrix to sparse if necessary
imadmali Aug 14, 2017
87ec6a9
prior must be on theta_raw when model_type = 2
imadmali Aug 15, 2017
4ea4674
remove INLA dependency for bym2 model
imadmali Aug 30, 2017
41555ff
updating CAR spatial tests
imadmali Aug 30, 2017
31e8d1c
update CAR vignette
imadmali Aug 30, 2017
c3a55d0
changing names for spatial priors
imadmali Aug 31, 2017
2a9001a
updates to besag unit tests
imadmali Jan 12, 2018
557179b
fixing conflicts in feature/car
imadmali Jan 12, 2018
ee55e48
getting spatial models to compile
imadmali Jan 21, 2018
2da6c77
get posterior predict to work with CAR models (hacky)
imadmali Jan 22, 2018
7ab7a9a
updating spatial docs
imadmali Jan 30, 2018
a641852
return error when using loo on CAR models (tentative)
imadmali Sep 5, 2018
5a1173e
drop loo tests
imadmali Sep 5, 2018
a12bb9a
resolving merge conflicts in feature/car
imadmali Sep 6, 2018
4824fd3
modified namespace
imadmali Sep 7, 2018
02aa765
Merge branch 'master' into feature/car
imadmali Sep 7, 2018
3f52c0c
spatial.Rmd: fix encoding
bgoodri Sep 7, 2018
2876ac3
update from master
imadmali Sep 7, 2018
a129012
Merge branch 'master' into feature/car
imadmali Sep 7, 2018
b498fc0
making CAR family func consistent with other stan models
imadmali Sep 9, 2018
19a809d
fixing awkward family/link pairs; neg_bin(link=identity) still broken
imadmali Sep 10, 2018
a564548
fix count family/link issues in spatial.stan
imadmali Sep 12, 2018
8fe3ffd
cleanup loo
imadmali Oct 21, 2018
1556945
revise CAR tests
imadmali Oct 22, 2018
4ab4d1f
use sparseMatrix when dealing with CAR adjacency matrix
imadmali Oct 22, 2018
9b5bedc
simplify spatial tests
imadmali Oct 22, 2018
e65b2c7
fixing unit tests
imadmali Oct 23, 2018
04fc47a
Merge branch 'master' into feature/car
imadmali Oct 23, 2018
abd0a72
adding is_car instead of is(object, 'car')
imadmali Oct 23, 2018
557958c
revising spatial model vignette
imadmali Oct 28, 2018
05eed19
delete src/include
bgoodri Oct 28, 2018
6efcfe4
rm src/Modules*
bgoodri Oct 28, 2018
9eddcc2
dropping INLA requirement
imadmali Oct 28, 2018
e0dfbcc
fix family/link bounds in spatial.stan
imadmali Oct 28, 2018
ba88f97
Merge branch 'feature/car' of https://github.com/stan-dev/rstanarm in…
bgoodri Oct 28, 2018
5b08936
refactoring in-line declarations and likelihood contributions
imadmali Oct 28, 2018
8eb3c7f
NAMESPACE: update
bgoodri Oct 28, 2018
b3d968b
Merge branch 'feature/car' of https://github.com/stan-dev/rstanarm in…
bgoodri Oct 28, 2018
c809d14
Merge branch 'master' into feature/car
bgoodri Jul 31, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
revising spatial model vignette
  • Loading branch information
imadmali committed Oct 28, 2018
commit 557958c44c799d17f15f84ad440906b10744adb5
4 changes: 2 additions & 2 deletions R/log_lik.R
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 271,7 @@ ll_args.stanreg <- function(object, newdata = NULL, offset = NULL, m = NULL,
trials <- 1
if (is.factor(y))
y <- fac2bin(y)
if (!is(object, "car"))
if (!is_car(object))
stopifnot(all(y %in% c(0, 1)))
else
trials <- object$trials
Expand Down Expand Up @@ -359,7 359,7 @@ ll_args.stanreg <- function(object, newdata = NULL, offset = NULL, m = NULL,
data <- cbind(data, as.matrix(z)[1:NROW(x),, drop = FALSE])
draws$beta <- cbind(draws$beta, b)
}
if (is(object, "car")) {
if (is_car(object)) {
psi_indx <- grep("^psi\\[[[:digit:]] \\]", colnames(stanmat))
psi <- stanmat[, psi_indx, drop = FALSE]
data$psi <- t(psi)
Expand Down
2 changes: 1 addition & 1 deletion R/stan_besag.R
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 151,7 @@ stan_besag <- function(formula,
stan_function = stan_function)

if (family$family == "binomial") {
fit$family <- binomial(link = "logit")
# fit$family <- binomial(link = "logit")
fit$trials <- trials
}
out <- stanreg(fit)
Expand Down
6 changes: 3 additions & 3 deletions R/stanreg.R
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 31,7 @@ stanreg <- function(object) {
nobs <- NROW(y)
ynames <- if (is.matrix(y)) rownames(y) else names(y)

# is_car <- object$stan_function %in% c("stan_besag", "stan_bym", "stan_bym2")
is_car <- object$stan_function %in% c("stan_besag", "stan_bym", "stan_bym2")
is_betareg <- is.beta(family$family)
if (is_betareg) {
family_phi <- object$family_phi # pull out phi family/link
Expand Down Expand Up @@ -92,7 92,7 @@ stanreg <- function(object) {

# linear predictor, fitted values
eta <- linear_predictor(coefs, x, object$offset)
if (is_car(object)) {
if (is_car) {
psi_indx <- grep("psi", colnames(as.matrix(object$stanfit)))
psi <- as.matrix(object$stanfit)[,psi_indx]
psi <- unname(colMeans(psi))
Expand Down Expand Up @@ -156,7 156,7 @@ stanreg <- function(object) {
out$eta_z <- eta_z
out$phi <- phi
}
if (is_car(object)) {
if (is_car) {
out$psi <- psi
out$trials <- object$trials
}
Expand Down
30 changes: 23 additions & 7 deletions vignettes/spatial.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -41,27 41,31 @@ The linear predictor takes the following form,
$$
\boldsymbol{\eta} = \alpha \mathbf{X}\boldsymbol{\beta} \boldsymbol{\psi}
$$
where $\alpha$ is the intercept, $\mathbf{X}$ is an $N$-by-$K$ matrix of predictors, $\boldsymbol{\beta}$ is a $K$-dimensional vector of regression coefficients, and $\boldsymbol{\psi}$ is a $N$-dimensional vector representing the spatial effect (and $N$ denotes the number of spatial units). The construction of $\boldsymbol{\psi}$ depends on the model, which is discussed in the relevant sections below.
where $\alpha$ is the intercept, $\mathbf{X}$ is an $N$-by-$K$ matrix of predictors ($N$ being the number of observations and $K$ being the number of predictors), $\boldsymbol{\beta}$ is a $K$-dimensional vector of regression coefficients, and $\boldsymbol{\psi}$ is a $N$-dimensional vector representing the spatial effect. The construction of $\boldsymbol{\psi}$ depends on the model, which is discussed in the relevant sections below.

Depending on the choice of likelihood there may or may not be an additional auxiliary parameter $\gamma$ in the model (e.g. in a Gaussian likelihood this would be the variation of the data). With all these components, for some probability density/mass function $f$, we can state the general form of the likelihood as,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is $\gamma$ actually $\sigma$ in the output?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's aux in the stan model but I think this maps to different things depending on the likelihood (e.g. it's sigma in the gaussian case).


Depending on the choice of likelihood there may or may not be an additional auxiliary parameter $\gamma$ in the model (e.g. in a Gaussian likelihood this would be the variation of the data). Thus, for some probability density/mass function $f$ we can state the general form of the likelihood as,
$$
\mathcal{L}(\alpha, \boldsymbol{\beta}, \gamma | \mathbf{y}) = \prod_{i=1}^N f(y_i | \alpha, \boldsymbol{\beta}, \gamma )
$$

## GMRF Hierarchical Component

CAR models require that you define the spatial component as a Gaussian Markov Random Field (GMRF). The random vector $\boldsymbol{\phi}$ is a GMRF with respect to the graph $\mathcal{G} = (\mathcal{V} = \{1,\ldots,n\},\mathcal{E})$ with mean vector $\boldsymbol{\mu}$ and precision matrix $\mathbf{W}$ if its probability density function takes the precision form of the multivariate normal distribution,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include an image for a small graph like that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like you want to include a plot of the lattice? I have that in there if you build the vignette.


$$
f(\boldsymbol{\phi} | \boldsymbol{\mu}) = (2\pi)^{-n/2} |\mathbf{W}|^{1/2}
\exp\bigg( -\frac{1}{2}(\boldsymbol{\phi-\mu})^{\top}\mathbf{W}(\boldsymbol{\phi-\mu}) \bigg)
$$

and $W_{i,j} \neq 0$ when $\{i,j\}\in\mathcal{E}$ for all $i \neq j$. Here, $\mathcal{V}$ refers to the verticies on the graph (i.e. the spatial units) and $\mathcal{E}$ refers to the edges on the graph (i.e. the spatial units that are neighbors). In other words, $\mathbf{W}$ is an $N$-by-$N$ matrix (with zeros on the diagonal) which describes spatial adjacency.

Unfortunately there is no guarantee that $\mathbf{W}$ is positive definite so $\mathbf{Q} = \mbox{diag}(\mathbf{W1}) - W$ (which is guaranteed to be positive semi-definite) is used as the precision matrix.

## Besag (ICAR) Spatial Prior

The `stan_besag` modeling function fits the data to an ICAR model. This means that the spatial effect enters the linear predictor as

$$
\begin{align}
\boldsymbol{\psi} = \boldsymbol{\phi} \\
Expand All @@ -74,7 78,8 @@ where $\tau$ is a scalar that controls the overall spatial variation and has an

## BYM

The ICAR model is limited in that it only accounts for spatial variation among the spatial units. Thus, the random variation is picked up by the spatial variation which results in misleading parameter estimates and invalid inferences. The `stan_bym` model explains spatial variation as the sum of a structured (spatial) component $\boldsymbol{\phi}$ and an unstructured (random) component $\boldsymbol{\theta}$. Therefore, the spatial effect takes the following form,
The ICAR model is limited in that it only accounts for spatial variation among the spatial units. Thus, the random variation is picked up by the spatial variation which may result in misleading parameter estimates and invalid inferences. The `stan_bym` model explains spatial variation as the sum of a structured (spatial) component $\boldsymbol{\phi}$ and an unstructured (random) component $\boldsymbol{\theta}$. Therefore, the spatial effect takes the following form,

$$
\begin{align}
\boldsymbol{\psi} &= \rho\boldsymbol{\phi} \tau\boldsymbol{\theta} \\
Expand All @@ -83,11 88,13 @@ f(\boldsymbol{\phi} | \boldsymbol{\mu}) &= (2\pi)^{-n/2} |\mathbf{Q}|^{1/2}
f(\theta_i) &= (2\pi)^{-1/2}\exp{\bigg( -\frac{\theta_i^2}{2} \bigg)}
\end{align}
$$

Note that the unstructured effect $\boldsymbol{\theta}$ is distributed standard normal.

## BYM2 (Variant of the BYM Spatial Prior)

The `stan_bym2` modeling function fits the data to a variant of the BYM model where the spatial effect enters as a convolution of the structured (spatial) effect and the unstructured (random) effect,

$$
\begin{align}
\boldsymbol{\psi} &= \tau(\boldsymbol{\theta}\sqrt{1-\rho} \boldsymbol{\phi}\sqrt{\rho}) \\
Expand All @@ -97,9 104,9 @@ f(\theta_i) &= (2\pi)^{-1/2}\exp{\bigg( -\frac{\theta_i^2}{2} \bigg)}
\end{align}
$$

As in the BYM model $\boldsymbol{\theta}$ is distributed standard normal. However, the parameter $\rho$ is on the unit interval and is interpreted as the proportion of spatial variation that is contributed to overall variation, and $\tau$ explains the overall (convolved) variation.
As in the BYM model $\boldsymbol{\theta}$ is distributed standard normal. However, now the parameter $\rho$ is on the unit interval and is interpreted as the proportion of spatial variation that is contributed to overall variation, and $\tau$ explains the overall (convolved) variation.

Priors on $\rho$ should be chosen wisely as $\rho=0$ reduces to a model that not account for spatial variation and $\rho=1$ reduces to the ICAR model, which does not account for random variation among the spatial units.
Priors on $\rho$ should be chosen carefully as $\rho=0$ reduces to a model that not account for spatial variation and $\rho=1$ reduces to the ICAR model, which does not account for random variation among the spatial units.

## Posterior

Expand Down Expand Up @@ -193,19 200,28 @@ spplot(grid_sim, "y_pred", at = var_range_y_pred, main = expression(y[pred]),
```

Alternatively we can look at the conventional one-dimensional posterior predictive check with the `pp_check` function.

```{r ppcheck-1d, fig.align='center', fig.height=8}
pp_check(fit_besag)
```

In order to compare the predictive performance between the models we need to use the [loo](http://mc-stan.org/loo) package. Currently, however, this is not supported for CAR spatial models.
In order to compare the predictive performance between the models we need to use the [loo](http://mc-stan.org/loo) package. Looking at the results below we can confirm that the correct model outperforms the incorrect model in terms of predictive accuracy.

```{r loo}
library(loo)
loo_correct <- loo(fit_besag)
loo_incorrect <- loo(fit_besag_bad)
compare(loo_correct, loo_incorrect)
```

## Smoothing the Spatial Random Walk

In some cases modeling the GMRF spatial component with the precision matrix $\mathbf{Q}$ leads to rough spatial varation. This often occurs when dealing with spatial units on a fine lattice. Using the square of the precision matrix $\mathbf{Q}\mathbf{Q}$ allows us to smooth out the spatial variation across the spatial units using a spatial random walk of order 2. Below we use the `order = 2` argument to fit the model with smoothing of order 2.

```{r smoothing, results="hide"}
fit_besag_smooth <- stan_besag(y ~ 1 x z, data = spatial_data, W, order = 2,
prior_intercept = normal(0,1), prior = normal(0,1), prior_rho = normal(0,1),
prior_intercept = normal(0,1), prior = normal(0,1),
prior_rho = normal(0,1),
family = binomial(link="logit"), trials = spatial_data$trials,
chains = CHAINS, cores = CORES, seed = SEED, iter = ITER)
```
Expand Down