Generalized Additive Models for Location Scale and Shape using DataSHIELD
Source:R/ds.gamlss.R
ds.gamlss.Rd
Fits a Generalized Additive Model for Location, Scale and shape (GAMLSS) using DataSHIELD on data from a single source or multiple sources on the server side.
Usage
ds.gamlss(
formula = NULL,
sigma.formula = ~1,
nu.formula = ~1,
tau.formula = ~1,
family = "NO()",
data = NULL,
min.values = NULL,
max.values = NULL,
min.max.names = NULL,
checks = FALSE,
mu.fix = FALSE,
sigma.fix = FALSE,
nu.fix = FALSE,
tau.fix = FALSE,
control = c(0.001, 20, 1, 1, 1, 1, Inf),
i.control = c(0.001, 50, 30, 0.001),
autostep = TRUE,
datasources = NULL
)
Arguments
- formula
A formula object, specifying the model for the mu distribution parameter. The response is on the left of an ~ operator, and the terms, separated by + operators, are on the right. Currently, only penalized beta splines, indicated by
pb()
, are supported for nonparametric smoothing, e.g.y~pb(x1)+x2+x2*x3
.- sigma.formula
A formula object, specifying the model for the sigma distribution parameter, as in
formula
. The only difference is, that it is not necessary to specify the response variable, e.g.sigma.formula=~pb(x)
.- nu.formula
A formula object, specifying the model for the nu distribution parameter, as in
formula
. The only difference is, that it is not necessary to specify the response variable, e.g.nu.formula=~pb(x)
.- tau.formula
A formula object, specifying the model for the tau distribution parameter, as in
formula
. The only difference is, that it is not necessary to specify the response variable, e.g.tau.formula=~pb(x)
.- family
A string, specifying the distribution of the response variable and the link functions of the distribution parameters. Currently, the following families are supported:
family=c('NO()', 'NO2()', 'BCCG()', 'BCPE()')
. Details on the distributions can be found ingamlss.family
. Defaultfamily='NO()'
.- data
A string, specifying the name of an (optional) data frame on the server-side containing the variables occurring in the formulas. If this is missing, the variables should be on the parent environment on the server-side or referenced explicitly as
dataname$varname
.- min.values
A numeric vector specifying minimum values for the covariates, which are used to determine the knots for
pb()
. Ifmin.values=NULL
an anonymized (noisy) minimum is used instead to determine the knots on all servers. Defaultmin.values=NULL
.- max.values
A numeric vector specifying maximum values for the covariates, which are used to determine the knots for
pb()
. Ifmax.values=NULL
an anonymized (noisy) maximum is used instead to determine the knots on all servers. Defaultmin.values=NULL
.- min.max.names
A string vector specifying the names for the minimum (
min.values
) and maximum values (max.values
). Only required ifmin.values
ormax.values
are given. Defaultmin.max.names=NULL
.- checks
Logical, if
checks=TRUE
ds.gamlss
checks whether the required variables for the model exist on each server and are not completely missing. Defaultchecks=FALSE
.- mu.fix
Logical, indicating whether the mu distribution parameter should be kept fixed during the fitting processes. Default
mu.fix=FALSE
.- sigma.fix
Logical, indicating whether the sigma distribution parameter should be kept fixed during the fitting processes. Default
sigma.fix=FALSE
.- nu.fix
Logical, indicating whether the nu distribution parameter should be kept fixed during the fitting processes. Default
nu.fix=FALSE
.- tau.fix
Logical, indicating whether the tau distribution parameter should be kept fixed during the fitting processes. Default
tau.fix=FALSE
.- control
Numeric vector with seven elements that sets the control parameters of the outer iterations algorithm using the
gamlss.control
function: (i) c.crit (the convergence criterion for the algorithm), (ii) n.cyc (the number of cycles of the algorithm), (iii) mu.step (the step length for the distribution parameter mu), (iv) sigma.step (the step length for the distribution parameter sigma), (v) nu.step (the step length for the distribution parameter nu), (vi) tau.step (the step length for the distribution parameter tau), (vii) gd.tol (global deviance tolerance level). Defaultcontrol=c(0.001, 20, 1, 1, 1, 1, Inf)
.- i.control
Numeric vector with four elements that sets the control parameters of the inner iterations of the RS algorithm using the
glim.control
function: (i) cc (the convergence criterion for the algorithm), (ii) cyc (the number of cycles of the algorithm), (iii) bf.cyc (the number of cycles of the backfitting algorithm), (iv) bf.tol (the convergence criterion (tolerance level) for the backfitting algorithm). Defaulti.control=c(0.001, 50, 30, 0.001)
.- autostep
Logical, indicating whether the steps should be halved automatically if the new global deviance is greater than the old one. Default
autostep=TRUE
.- datasources
A list of
DSConnection-class
objects obtained after login. If thedatasources
argument is not specified the default set of connections will be used: seedatashield.connections_default
.
Value
A ds.gamlss object with all components as in the gamlss
function.
Individual-level information like the components y
(the response) and
residuals
(the normalised quantile residuals of the model) are not disclosed to
the client-side.
Details
Fits a Generalized Additive model for Location, scale and shape (GAMLSS)
using DataSHIELD on data from a single source or multiple sources on the server side. In the latter
case, the data are co-analysed (when using ds.gamlss
) by using an approach
that is mathematically equivalent to placing all individual-level data from all sources
in one central warehouse and analysing those data using the conventional
gamlss
function in R. For additional details please see the header of the
gamlss
function.
Server functions called: gamlssDS1
,
gamlssDS2
,
gamlssDS3
,
gamlssDS4
,
gamlssDS5
,
gamlssDS6
Examples
library(DSLite)
#> Loading required package: DSI
#> Loading required package: progress
#> Loading required package: R6
#> Loading required package: rly
data(mtcars)
## Set up DSLite server
dslite.server1 <- newDSLiteServer(
tables = list(data = mtcars[c(1:15), ]),
config = defaultDSConfiguration(include = c("dsBase", "dsGamlss"))
)
dslite.server2 <- newDSLiteServer(
tables = list(data = mtcars[c(16:nrow(mtcars)), ]),
config = defaultDSConfiguration(include = c("dsBase", "dsGamlss"))
)
builder <- DSI::newDSLoginBuilder()
builder$append(server = "study1", url = "dslite.server1", table = "data", driver = "DSLiteDriver")
builder$append(server = "study2", url = "dslite.server2", table = "data", driver = "DSLiteDriver")
logindata.dslite <- builder$build()
# Login to the virtualized server
conns <- DSI::datashield.login(logindata.dslite, assign = TRUE)
#>
#> Logging into the collaborating servers
#> Error in base::get(url, envir = getOption("datashield.env", parent.frame())): object 'dslite.server1' not found
DSI::datashield.assign.table(conns = conns, symbol = "D", table = c("data", "data"))
#> Error: object 'conns' not found
## Examples
# Example 1: parametric model
model1 <- ds.gamlss(formula = mpg ~ wt, sigma.formula = ~disp, data = "D", family = "NO()")
#> Error: Are you logged in to any server? Please provide a valid DSConnection object!
# Example 2: penalized beta splines
model2 <- ds.gamlss(formula = mpg ~ pb(wt), sigma.formula = ~disp, data = "D", family = "NO()")
#> Error: Are you logged in to any server? Please provide a valid DSConnection object!
# Example 3: penalized beta splines with known minimum and maximum
model3 <- ds.gamlss(
formula = mpg ~ pb(wt), sigma.formula = ~disp,
min.values = min(mtcars$wt),
max.values = max(mtcars$wt),
min.max.names = "wt",
data = "D", family = "NO()"
)
#> Error: Are you logged in to any server? Please provide a valid DSConnection object!
## Logout
DSI::datashield.logout(conns)
#> Error: object 'conns' not found