A longitudinal causal graph analysis investigating modifiable risk factors and obesity in a European cohort of children and adolescents

Cohort Causal Graph

The following cohort causal graphs (CCGs) are based on 2 - 16 year old European children and adolescents from the IDEFICS/I.family cohort. The data set contains N = 5,112 children born between 1997 and 2006 who participated in all three waves of the study.

We used the temporal order of the variables as prior knowledge for the analysis by distributing the variables into different tiers. The analysis data set consisted of 51 variables that were distributed over five tiers:

Context variables (social and cultural background)
Early life factors
Baseline variables (B, the first cohort survey)
First follow-up (FU1, two years after baseline)
Second follow-up (FU2, six years after baseline)

We assume that variables from a tier with a lower number can affect variables in tiers with a higher number, but not vice versa. In addition, we forbid edges between some variable pairs such as edges pointing to age or individual child variables (e.g. physical activity) pointing to ISCED or parental income.

All CGGs of childhood obesity were estimated by the temporal PC-algorithm (tPC) for multiple imputed data sets using the R-packages tpc and micd. The tPC package allows to make use of prior knowledge regarding the temporal order of the cohort data and the micd package offers the possibility to run the pc algorithm on multiple imputed data sets and with mixed variable scales. PC and tPC algorithm are both constraint-based structure learning algorithms. Both, tPC and micd rely on the PC algorithm as implemented in pcalg.

Tiers	Variable/Node	Unit	Comments
Context	Sex	female/male	Sex of child
Context	Region	North/Central/South	Place of residence
Context	Migrant	no/yes	Children were assumed to have a migrant background if they usually speak with their parents in a language other than the national language of the corresponding country
Early life	Mother's age at birth	years
Early life	Total breastfeeding	months	incl. breast-feeding combinations prior child's diet was fully integrated into usual household diet
Early life	Birthweight	gramm
Early life	Weeks of pregnancy	weeks
Early life	Formula milk	no/yes	Type of feeding prior child's diet was fully integrated into the usual household diet
Early life	HH diet	months	Month when the child was introduced into the household's diet
Early life	Smoking during pregnancy	no/yes	Mother consumed tobacco during pregnancy
Context: B, FU1, FU2	Age	months
Context: B, FU1, FU2	School	kindergarten/school/ neither one
Context: B, FU1, FU2	Income	low/middle/high	Country-specific parental income
Context: B, FU1, FU2	ISCED	low/middle/high	International Standard Classification of Education, highest parental education
B, FU1, FU2	AVM	h/day	Audio-visual media consumption
B, FU1, FU2	zBMI	z-score	Body mass index
B, FU1, FU2	Mother's BMI	kg/m^2	Body mass index of the child's mother
B, FU1, FU2	Daily family meals	no/yes
B, FU1, FU2	PA	h/day	Physical activity measured by questionnaire
B, FU1, FU2	Sleep	h/day	Total sleep
B, FU1, FU2	Well-being	%	Sum score based on the KINDL-R quality of life questionnaire
B, FU1, FU2	YHEI	%	Youth healthy eating score
B, FU1, FU2	HOMA	z-score	HOmeostatic Model Assessment
FU2	Alcohol	no/yes	Ever alcohol drinking in teen's life-time
FU2	Puberty	pre- or early pubertal/pubertal	Pubertal status
FU2	Smoking	no/yes	Ever smoking tobacco in teen's life-time

library(tpc)
library(micd)

## suffienct statistic
suff.all <- getSuff(my.mids.data, test = "flexMItest")

## CCG
  graph <- tpc(suffStat = suff.all,
              indepTest = flexMItest,
            skel.method = "stable.parallel",
                  label = V.pa,
                  alpha = 0.05,
                  tiers = c(rep(1, 3), rep(2, 7), rep(3, 13), rep(4, 13), rep(5, 15)),
              forbEdges = fg, # a matrix of size 
                              # ncol(my.mids.data$data) x ncol(my.mids.data$data)
               numCores = detectCores()-1)

Fig.1 Missing values were ten times imputed with multiple imputation based on chained equation using the mice R-package. Random forests served as imputation method. Graph discovery used a nominal level of 0.05. Click on nodes to shift them in the graph.

Note: nodes are coloured with respect to their appearance in the life course. Edges without arrowheads could not be orientated by the algorithm.

Graph characteristics	Main graph
Number of selected edges	104
Number of undirected edges	12
Avg. number of outgoing edges	2.4

Sensitivity analysis

CCG based on MI with \(\alpha = 0.1\)

g.alpha <- tpc(suffStat = suff.all,
              indepTest = flexMItest,
            skel.method = "stable.parallel",
                  label = V.pa,
                  alpha = 0.1,
                  tiers = c(rep(1, 3), rep(2, 7), rep(3, 13), rep(4, 13), rep(5, 15)),
              forbEdges = fg, 
               numCores = detectCores()-1)

Fig.2 CCG based on same multiple imputed data as in Fig. 1, but graph discovery used a nominal level of 0.1.

Graph characteristics	Main graph	MI, α = 0.1
Number of selected edges	104	113
Number of undirected edges	12	13
Avg. number of outgoing edges	2.4	2.5
Hamming distance	-	19
Structural Hamming distance	-	34

CCG based on test-wise deletion

g.twd <- tpc(suffStat = data.with.missing.values,
            indepTest = flexCItwd,
          skel.method = "stable.parallel",
                alpha = 0.05,
            forbEdges = fg,
               labels = colnames(fg),
                tiers = c(rep(1, 3), rep(2, 7), rep(3, 13), rep(4, 13), rep(5, 15)),
             numCores = detectCores()-1)

Fig.2 CCG based on test-wise deletion. Each performed conditional independence test between two variables given a set of variables was computed using all complete observations on these variables.

Graph characteristics	Main graph	MI, α = 0.1	TWD
Number of selected edges	104	113	138
Number of undirected edges	12	13	5
Avg. number of outgoing edges	2.4	2.5	2.8
Hamming distance	-	19	96
Structural Hamming distance	-	34	110

Structural EM algorithm

library(bnlearn)
sem <- structural.em(data.with.missing.values,
                     maximize = "hc",
                     maximize.args = list(blacklist = bl))
                    # bl is a matrix of forbidden directed edges of dimension 
                    # "number of forbidden arrow" X 2

Fig.5 The DAG was estimated using the structural EM algorithm applying the Hill-Climbing score-based algorithm in the maximization step.

Graph characteristics	Main graph	MI, α = 0.1	TWD	SEM
Number of selected edges	104	113	138	157
Number of undirected edges	12	13	5	0
Avg. number of outgoing edges	2.4	2.5	2.8	3.1
Hamming distance	-	19	96	117
Structural Hamming distance	-	34	110	131

Bootstrap CCGs

For each bootstrap sample the data was once imputed using mice based on random forest imputation. The following CCGs base on 100 bootstrap replications.

Fig.3 Bootstrapped CCG with edges frequencies larger equal than 44 %

Fig.4 Bootstrapped CCG with edges frequencies larger equal than 75 %

Graph characteristics	Main CCG	MI, α = 0.1	TWD	SEM	MI, BG44	MI, BG75
Number of selected edges	104	113	138	157	104	46
Number of undirected edges	12	13	5	0	3	0
Avg. number of outgoing edges	2.4	2.5	2.8	3.1	2.1	0.9
Hamming distance	-	19	96	117	56	70
Structural Hamming distance	-	34	110	131	73	86