Skip to content

R package for change-point analysis and latent block structure recovery from network time series data

Notifications You must be signed in to change notification settings

jongheepark/NetworkChange

Repository files navigation

NetworkChange

NetworkChange is an R package that detects structural changes in longitudinal network data using the latent space approach. Based on the Bayesian multi-array representation of longitudinal networks.

NetworkChange performs Bayesian hidden Markov analysis (Chib 1998) to discover changes in structural network features across temporal layers. NetworkChange can handle various forms of network changes such as block-splitting, block-merging, and core-periphery changes. NetworkChange also provides functions for model diagnostics using WAIC, average loss, and log marginal likelihoods as well as visualization tools for dynamic analysis results of longitudinal networks.

install

## from CRAN
## install.packages("NetworkChange")

## most recent version from github
## you need to install devtools package first
## devtools::install_github("jongheepark/NetworkChange")

library(NetworkChange)
#> Loading required package: MCMCpack
#> Loading required package: coda
#> Loading required package: MASS
#> ##
#> ## Markov Chain Monte Carlo Package (MCMCpack)
#> ## Copyright (C) 2003-2022 Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park
#> ##
#> ## Support provided by the U.S. National Science Foundation
#> ## (Grants SES-0350646 and SES-0350613)
#> ##
#> Registered S3 method overwritten by 'GGally':
#>   method from   
#>    .gg   ggplot2
\label{fig:list}Summary of selected features and functions of the package.

Summary of selected features and functions of the package.

Input network data and synthetic data generation

Input data for NetworkChange takes an array form. Hence, the dimension of input data is (N \times N \times T). Currently, NetworkChange allows longitudinal data of symmetric ((i.e.) undirected) networks.

One quick way to generate a synthetic longitudinal network data set with a changepoint is to call MakeBlockNetworkChange(). It has three clusters by default and users can choose the size of data by selecting the number of nodes in each cluster ((n)) and the length of time ((T)). For example, if one chooses (n=10) and (T=20), an array of (30 \times 30 \times 20) is generated. base.prob is the inter-cluster link probability, and block.prob base.prob is the intra-cluster link probability. When one sets block.prob>0, a clustered network data set is generated. If block.prob=0, we have a random network data set.

set.seed(11173)
n <- 10 ## number of nodes in each cluster
Y <- MakeBlockNetworkChange(n=n, break.point = .5,
                            base.prob=.05, block.prob=.7,
                            T=20, type ="split")
dim(Y)
#> [1] 30 30 20

The above code generates a longitudinal network data set with a changepoint in the middle (break.point=0.5). We specify the type of network change as a block-splitting change (`type =“split”}) where the initial two cluster network splits into three clusters.

Currently, MakeBlockNetworkChange() provides five different options of network changes (type): “constant”, “merge”, “split”, “merge-split”, and “split-merge.” If “constant” is chosen, the number of breaks is zero. If “merge” or “split” is chosen, the number of breaks is one. If either “merge-split” or “split-merge” is chosen, the number of breaks is two.

Users can use plotnetarray() to visualize the longitudinal network data. The resulting plot shows a selected number of equi-distant time-specific networks. By default, n.graph is 4. Users can change plot settings by changing options in ggnet.

plotnetarray(Y)

A block-merged data set can be generated by settting type ="merge" of MakeBlockNetworkChange(). We draw 4 snapshots using plotnetarray().

set.seed(11173)
Ymerge <- MakeBlockNetworkChange(n=n, break.point = .5,
                               base.prob=.05, block.prob=.7,
                               T=20, type ="merge")
plotnetarray(Ymerge)

Latent space discovery

In this section, We estimate the chagepoints of the block-splitting data set Y generated above. We fit a HNC using the principal eigen-matrix for degree correction.

G <- 100
Yout <-  NetworkChange(Y, R=2, m=1, mcmc=G, burnin=G, verbose=0)
#> Initializing using the Alternating Least Squares Method: # of iteration =  10

We then use drawPostAnalysis() to draw latent node positions obtained from HNC. We also highlight the latent cluster structure using the (k)-means clustering method over the estimated latent node positions of each regime. drawPostAnalysis() provides an option to include k-means clustering results of the latent node positions for each regime. n.cluster sets the number of clusters in each regime. The clustering results are depicted by assigning unique colors over the clusters. These two functions, plotting the latent node coordinates and k-means clustering, can also be applied separately by using plotU and kmeansU respectively.

Ydraw <- drawPostAnalysis(Yout, Y, n.cluster=c(2,3))
#> Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
#> returning NA

#> Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
#> returning NA

#> Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
#> returning NA

#> Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
#> returning NA
multiplot(plotlist=Ydraw, cols=2)
#> Warning: ggrepel: 3 unlabeled data points (too many overlaps). Consider
#> increasing max.overlaps

As shown by the difference between the distributions of the latent node traits, the clusters in regime 1 can be clearly distinguished by their coordinates on the first dimension whereas both dimensional coordinates are required to distinguish the three clusters in regime 2.

Such transition of the group structure can be easily depicted by plotting the change of the network generation rule parameters ((\v_t)) over time. plotV allows one to draw such patterns. While the first dimensional generation rule parameter (v_1) exhibits constaly high values over the entire period, the second dimensional generation rule parameter rises dramatically near (t=10), the ground truth changepoint .

plotV(Yout, cex=2)

Break number diagnostics

Although the above analysis shows suggestive evidence in favor of a single network changepoint, it does not provide a statistical judgement criterion for determining the break number parameter of HNC. Model diagnostics is required to coose an appropriate number of breaks and derive the final modeling result.

BreakDiagnostic() allows one to check a statistically supported number of breaks by fitting multiple HNC models with a varying number of breaks. break.upper setting allows one to choose the upper limit of break numbers to be examined.

set.seed(1223)
G <- 100
detect <- BreakDiagnostic(Y, R=2, mcmc=G, burnin=G, verbose=0, break.upper=3)

Users can draw plots of diagnostic results by calling the first element of the test object.

detect[[1]]
#> Warning: Removed 1 row(s) containing missing values (geom_path).
#> Warning: Removed 1 rows containing missing values (geom_point).

Otherwise, users can print out the numerical results which is stored in the second element of the test object.

print(detect[[2]])
#> $LogMarginal
#> [1] -3075.330 -2651.176 -2659.268 -2696.535
#> 
#> $Loglike
#> [1] -3049.776 -2863.429 -2589.197 -2610.019
#> 
#> $WAIC
#> [1] 6223.589 5402.413 5439.349 5513.796
#> 
#> $`Average Loss`
#> [1] 0.00 0.00 0.04

All diagnostic measures clearly indicate that the number of breaks in the synthetic data is one, agreeing with that of the synthetic data generating process.

Empirical data analysis example

In this section, we analyze changes in the international military alliance network among major powers. The data set is originally from (Gibler 2008) and users can call this data set by data(MajorAlly).

Our goal in this section is to detect structural changes in the longitudinal alliance network among major powers using HNC. We follow the COW dataset’s coding of “major powers” (the United Kingdom, Germany, Austria-Hungary, France, Italy, Russia, the United States, Japan, and China) in the analysis. We aggregated every 2 year network from the original annual binary networks to increase the density of each layer.

data(MajorAlly)
Y <- MajorAlly
time <- dim(Y)[3]
drop.state <- c(which(colnames(Y) == "USA"), which(colnames(Y) == "CHN"))
newY <- Y[-drop.state, -drop.state, 1:62]

First, we fit a pilot model to elicit reasonable inverse gamma prior values for (\v_t) ((v_0) and (v_1)).

G <- 100
set.seed(1990)
test.run <- NetworkStatic(newY, R=2, mcmc=G, burnin=G, verbose=0,
                          v0=10, v1=time*2)
V <- attr(test.run, "V")
sigma.mu = abs(mean(apply(V, 2, mean)))
sigma.var = 10*mean(apply(V, 2, var))
v0 <- 4   2 * (sigma.mu^2/sigma.var)
v1 <- 2 * sigma.mu * (v0/2 - 1)

Then, we diagnose the break number by comparing model-fits of several models with a varying number of breaks.

set.seed(11223);
detect2 <- BreakDiagnostic(newY, R=2, break.upper=2,
                           mcmc=G, burnin=G, verbose=0,
                           v0=v0, v1=v1)
detect2[[1]]
#> Warning: Removed 1 row(s) containing missing values (geom_path).
#> Warning: Removed 1 rows containing missing values (geom_point).

The test results from WAIC, log marginal likelihood, and average loss indicate that HNC with two breaks is most reasonable.

Based on the test result, we fit the HNC with two breaks to the major power alliance network and save the result in R object fit.

G <- 100
K <- dim(newY)
m <- 2
initial.s <- sort(rep(1:(m 1), length=K[[3]]))
set.seed(11223);
fit <- NetworkChange(newY, R=2, m=m, mcmc=G, initial.s = initial.s,
                     burnin=G, verbose=0, v0=v0, v1=v1)

First, we can examine transitions of hidden regimes by looking at posterior state probabilities ((p(\S | \mathcal{Y}, \Theta))) over time. plotState() in MCMCpack pacakge provides a function to draw the posterior state probabilities from changepoint analysis results. Since our input data is an array, we need to change the input data as a vector.

attr(fit, "y") <- 1:K[[3]]
plotState(fit, start=1)

Next, we draw regime-specific latent node positions of major powers using drawPostAnalysis. Users can choose the number of clusters in each regime by `n.cluster}.

p.list <- drawPostAnalysis(fit, newY, n.cluster=c(4, 4, 3))
#> Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
#> returning NA

#> Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
#> returning NA

#> Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
#> returning NA

#> Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
#> returning NA

#> Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
#> returning NA

#> Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
#> returning NA
multiplot(plotlist = p.list, cols=3)

Then, using drawRegimeRaw(), we can visualize original network connections for each regime by collapsing network data within each regime.

drawRegimeRaw(fit, newY)

The results are highly interesting to scholars of international relations and history. We discuss the results in details in (Park and Sohn 2020). One of the most interesting findings is the centrality of Austria-Hungary (AUH) during the first two regimes. Austria-Hungary played a role of broker that connected other major powers between 1816 and 1890. This period roughly overlaps with “the age of Metternich” (1815-1848) (Rothenberg 1968). Throughout the first two regimes, the network position of Austria-Hungary remained highly critical in the sense that the removal of Austria-Hungary would have made the major power alliance network almost completely disconnected. In the language of social network analysis, Austria-Hungary filled a “structural hole” in the major power alliance network at the time, playing the role of broker (Burt 2009).

Identifying hidden regimes of the military alliance network makes it clear the central role of Austria-Hungary during the first two regimes in the military alliance network among major powers.

We believe that NetworkChange will increase the capacity for longitudinal analysis of network data in various fields, not limited to social sciences.

Burt, Ronald S. 2009. Structural Holes: The Social Structure of Competition. Harvard University Press.

Chib, Siddhartha. 1998. “Estimation and Comparison of Multiple Change-Point Models.” Journal of Econometrics 86 (2): 221–41.

Gibler, Douglas. 2008. International Military Alliances, 1648-2008. CQ Press.

Park, Jong Hee, and Yunkyu Sohn. 2020. “Detecting Structural Changes in Longitudinal Network Data.” Bayesian Analysis 15 (1): 133–57.

Rothenberg, Gunther E. 1968. “The Austrian Army in the Age of Metternich.” Journal of Modern History 40 (2): 156–65.

About

R package for change-point analysis and latent block structure recovery from network time series data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published