Compute the expectation of some query variable(s), optionally conditioned on some event(s).
Arguments
- params
Circuit parameters learned via
forde
.- query
Optional character vector of variable names. Estimates will be computed for each. If
NULL
, all variables other than those inevidence
will be estimated. If evidence containsNA
s, those variables will be estimated and a full dataset is returned.- evidence
Optional set of conditioning events. This can take one of three forms: (1) a partial sample, i.e. a single row of data with some but not all columns; (2) a data frame of conditioning events, which allows for inequalities and intervals; or (3) a posterior distribution over leaves; see Details and Examples.
- evidence_row_mode
Interpretation of rows in multi-row evidence. If
'separate'
, each row inevidence
is a separate conditioning event for whichn_synth
synthetic samples are generated. If'or'
, the rows are combined with a logical or; see Examples.- round
Round continuous variables to their respective maximum precision in the real data set?
- nomatch
What to do if no leaf matches a condition in
evidence
? Options are to force sampling from a random leaf, either with a warning ("force_warning"
) or without a warning ("force"
), or to returnNA
, also with a warning ("na_warning"
) or without a warning ("na"
). The default is"force_warning"
.- stepsize
Stepsize defining number of evidence rows handled in one for each step. Defaults to nrow(evidence)/num_registered_workers for
parallel == TRUE
.- parallel
Compute in parallel? Must register backend beforehand, e.g. via
doParallel
ordoFuture
; see examples.
Details
This function computes expected values for any subset of features, optionally conditioned on some event(s).
There are three methods for (optionally) encoding conditioning events via the
evidence
argument. The first is to provide a partial sample, where
some columns from the training data are missing or set to NA
. The second is to
provide a data frame with condition events. This supports inequalities and intervals.
Alternatively, users may directly input a pre-calculated posterior
distribution over leaves, with columns f_idx
and wt
. This may
be preferable for complex constraints. See Examples.
Please note that results for continuous features which are both included in query
and in
evidence
with an interval condition are currently inconsistent.
References
Watson, D., Blesch, K., Kapar, J., & Wright, M. (2023). Adversarial random forests for density estimation and generative modeling. In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, pp. 5357-5375.
See also
arf
, adversarial_rf
, forde
, forge
, lik
Examples
# Train ARF and estimate leaf parameters
arf <- adversarial_rf(iris)
#> Iteration: 0, Accuracy: 77.03%
#> Iteration: 1, Accuracy: 40%
psi <- forde(arf, iris)
# What is the expected value of Sepal.Length?
expct(psi, query = "Sepal.Length")
#> Sepal.Length
#> 1 5.843333
# What if we condition on Species = "setosa"?
evi <- data.frame(Species = "setosa")
expct(psi, query = "Sepal.Length", evidence = evi)
#> Sepal.Length
#> 1 5.026998
# Compute expectations for all features other than Species
expct(psi, evidence = evi)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1 5.026998 3.414044 1.542446 0.2771403
# Condition on first two data rows with some missing values
evi <- iris[1:2,]
evi[1, 1] <- NA_real_
evi[1, 5] <- NA_character_
evi[2, 2] <- NA_real_
x_synth <- expct(psi, evidence = evi)
if (FALSE) { # \dontrun{
# Parallelization with doParallel
doParallel::registerDoParallel(cores = 4)
# ... or with doFuture
doFuture::registerDoFuture()
future::plan("multisession", workers = 4)
} # }