Estimating a Covariate-Varying Network (CVN)

Estimates a covariate-varying network model (CVN), i.e., \(m\) Gaussian graphical models that change with (multiple) external covariate(s). The smoothing between the graphs is specified by the \((m \times m)\)-dimensional weight matrix \(W\). The function returns the estimated precision matrices for each graph.

Usage

CVN(
  data,
  W,
  lambda1 = 1:2,
  lambda2 = 1:2,
  gamma1 = NULL,
  gamma2 = NULL,
  rho = 1,
  eps = 1e-04,
  maxiter = 100,
  truncate = 1e-05,
  rho_genlasso = 1,
  eps_genlasso = 1e-10,
  maxiter_genlasso = 100,
  truncate_genlasso = 1e-04,
  n_cores = min(length(lambda1) * length(lambda2), detectCores() - 1),
  normalized = FALSE,
  warmstart = TRUE,
  minimal = FALSE,
  gamma_ebic = 0.5,
  verbose = TRUE
)

Arguments

data: A list with matrices, each entry associated with a single graph. The number of columns should be the same for each matrix. Number of observations can differ
W: The \((m \times m)\)-dimensional symmetric weight matrix \(W\)
lambda1: Vector with different \(\lambda_1\). LASSO penalty terms (Default: 1:2)
lambda2: Vector with different \(\lambda_2\). The global smoothing parameter values (Default: 1:2)
gamma1: A vector of \(\gamma_1\)'s LASSO penalty terms, where \(\gamma_1 = \frac{2 \lambda_1}{m p (1 - p)}\). If gamma1 is set, the value of lambda1 is ignored. (Default: NULL).
gamma2: A vector of \(\gamma_2\)'s global smoothing parameters, where that \(\gamma_2 = \frac{4 \lambda_2}{m(m-1)p(p-1)}\). If gamma2 is set, the value of lambda2 is ignored.(Default: NULL).
rho: The \(\rho\) penalty parameter for the global ADMM algorithm (Default: 1)
eps: If the relative difference between two update steps is smaller than \(\epsilon\), the algorithm stops. (Default: 1e-4)
maxiter: Maximum number of iterations (Default: 100)
truncate: All values of the final \(\hat{\Theta}_i\)'s below truncate will be set to 0. (Default: 1e-5)
rho_genlasso: The \(\rho\) penalty parameter for the ADMM algorithm used to solve the generalized LASSO (Default: 1)
eps_genlasso: If the relative difference between two update steps is smaller than \(\epsilon\), the algorithm stops. (Default: 1e-10)
maxiter_genlasso: Maximum number of iterations for solving the generalized LASSO problem (Default: 100)
truncate_genlasso: All values of the final \(\hat{\beta}\) below truncate_genlasso will be set to 0. (Default: 1e-4)
n_cores: Number of cores used (Default: max. number of cores - 1, or the total number penalty term pairs if that is less)
normalized: Data is normalized if TRUE. Otherwise the data is only centered (Default: FALSE)
warmstart: If TRUE, use the glasso package for estimating the individual graphs first (Default: TRUE)
minimal: If TRUE, the returned cvn is minimal in terms of memory, i.e., Theta, data and Sigma are not returned (Default: FALSE)
gamma_ebic: Gamma value for the eBIC (Default: 0.5)
verbose: Verbose (Default: TRUE)

Value

A CVN object containing the estimates for all the graphs for each different value of \((\lambda_1, \lambda_2)\). General results for the different values of \((\lambda_1, \lambda_2)\) can be found in the data frame results. It consists of multiple columns, namely:

id: The id. This corresponds to the indices of the lists
lambda1: \(\lambda_1\) value
lambda2: \(\lambda_2\) value
gamma1: \(\gamma_1\) value
gamma2: \(\gamma_2\) value
converged: whether algorithm converged or not
value: value of the negative log-likelihood function
n_iterations: number of iterations of the ADMM
aic: Akaike information criterion
bic: Bayesian information criterion
ebic: Extended Bayesian information criterion
edges_median: Median number of edges across the m estimated graphs
edges_iqr: Interquartile range of edges across the m estimated graphs

The estimates of the precision matrices and the corresponding adjacency matrices for the different values of \((\lambda_1, \lambda_2)\) can be found

Theta: A list with the estimated precision matrices \(\{ \hat{\Theta}_i(\lambda_1, \lambda_2) \}_{i = 1}^m\), (only if minimal = FALSE)
adj_matrices: A list with the estimated adjacency matrices corresponding to the estimated precision matrices in Theta. The entries are 1 if there is an edge, 0 otherwise. The matrices are sparse using package Matrix

In addition, the input given to the CVN function is stored in the object as well:

Sigma: Empirical covariance matrices \(\{\hat{\Sigma}_i\}_{i = 1}^m\), (only if minimal = FALSE)
m: Number of graphs
p: Number of variables
n_obs: Vector of length \(m\) with number of observations for each graph
data: The data, but then normalized or centered (only if minimal = FALSE)
W: The \((m \times m)\)-dimensional weight matrix \(W\)
maxiter: Maximum number of iterations for the ADMM
rho: The \(\rho\) ADMM's penalty parameter
eps: The stopping criterion \(\epsilon\)
truncate: Truncation value for \(\{ \hat{\Theta}_i \}_{i = 1}^m\)
maxiter_genlasso: Maximum number of iterations for the generalized LASSO
rho_genlasso: The \(\rho\) generalized LASSO penalty parameter
eps_genlasso: The stopping criterion \(\epsilon\) for the generalized LASSO
truncate_genlasso: Truncation value for \(\beta\) of the generalized LASSO
n_lambda_values: Total number of \((\lambda_1, \lambda_2)\) value combinations
normalized: If TRUE, data was normalized. Otherwise data was only centered
warmstart: If TRUE, warmstart was used
minimal: If TRUE, data, Theta and Sigma are not added
hits_border_aic: If TRUE, the optimal model based on the AIC hits the border of \((\lambda_1, \lambda_2)\)
hits_border_bic: If TRUE, the optimal model based on the BIC hits the border of \((\lambda_1, \lambda_2)\)
gamma_ebic: Gamma value used to calculate eBIC

Reusing Estimates

When estimating the graph for different values of \(\lambda_1\) and \(\lambda_2\), we use the graph estimated (if available) for other \(\lambda_1\) and \(\lambda_2\) values closest to them.

Examples

data(grid)

#' Choice of the weight matrix W. Each of 2 covariates has 3 categories
#' (uniform random)
W <- create_weight_matrix("uniform-random", k = 3, l = 3)

# lambdas:
lambda1 = 1  # can also be lambda1 = 1:2 
lambda2 = 1

(fit <- CVN(data = grid, 
            W = W, 
            lambda1 = lambda1, lambda2 = lambda2, 
            n_cores = 1,
            eps = 1e-2, maxiter = 200, # fast but imprecise
            verbose = TRUE))
#> Estimating a CVN with 9 graphs...
#> 
#> Number of cores: 1
#> Uses a warmstart...
#> 
#> -------------------------
#> iteration 1  |  2.180956
#> iteration 2  |  0.115992
#> iteration 3  |  0.085703
#> iteration 4  |  0.030387
#> iteration 5  |  0.022670
#> iteration 6  |  0.017581
#> iteration 7  |  0.016135
#> iteration 8  |  0.014122
#> iteration 9  |  0.011050
#> iteration 10  |  0.010618
#> -------------------------
#> iteration 11  |  0.009648
#> Covariate-varying Network (CVN)
#> 
#> ✓ all converged
#> 
#> Number of graphs (m)    : 9
#> Number of variables (p) : 10
#> Number of lambda pairs  : 1
#> 
#> Weight matrix (W):
#> 9 x 9 sparse Matrix of class "dsCMatrix"
#>                                                                            
#>  [1,] .         0.6012593 0.3094466 0.3387542 0.2975096 0.2986163 0.3823848
#>  [2,] 0.6012593 .         0.5036383 0.5257210 0.4768016 0.5083998 0.7264707
#>  [3,] 0.3094466 0.5036383 .         0.2970355 0.2157332 0.2794692 0.3662958
#>  [4,] 0.3387542 0.5257210 0.2970355 .         0.2893602 0.2810684 0.3744014
#>  [5,] 0.2975096 0.4768016 0.2157332 0.2893602 .         0.2500744 0.3623793
#>  [6,] 0.2986163 0.5083998 0.2794692 0.2810684 0.2500744 .         0.3741879
#>  [7,] 0.3823848 0.7264707 0.3662958 0.3744014 0.3623793 0.3741879 .        
#>  [8,] 0.3184479 0.5514783 0.2465658 0.2723133 0.3065248 0.2787839 0.4334903
#>  [9,] 0.2250348 0.4363431 0.3054915 0.2642960 0.1564069 0.2694153 0.3516440
#>                          
#>  [1,] 0.3184479 0.2250348
#>  [2,] 0.5514783 0.4363431
#>  [3,] 0.2465658 0.3054915
#>  [4,] 0.2723133 0.2642960
#>  [5,] 0.3065248 0.1564069
#>  [6,] 0.2787839 0.2694153
#>  [7,] 0.4334903 0.3516440
#>  [8,] .         0.2091313
#>  [9,] 0.2091313 .        
#> 
#>   id lambda1 lambda2      gamma1      gamma2 converged       value n_iterations
#> 1  1       1       1 0.002469136 0.000617284      TRUE 0.009647579           12
#>        aic      bic     ebic edges_median edges_iqr
#> 1 14315.63 15748.48 18281.32           31         1