Skip to contents

Uses pre-trained FORDE model to simulate synthetic data.

Usage

forge(params, n_synth, evidence = NULL)

Arguments

params

Circuit parameters learned via forde.

n_synth

Number of synthetic samples to generate.

evidence

Optional set of conditioning events. This can take one of three forms: (1) a partial sample, i.e. a single row of data with some but not all columns; (2) a data frame of conditioning events, which allows for inequalities; or (3) a posterior distribution over leaves. See Details.

Value

A dataset of n_synth synthetic samples.

Details

forge simulates a synthetic dataset of n_synth samples. First, leaves are sampled in proportion to either their coverage (if evidence = NULL) or their posterior probability. Then, each feature is sampled independently within each leaf according to the probability mass or density function learned by forde. This will create realistic data so long as the adversarial RF used in the previous step satisfies the local independence criterion. See Watson et al. (2023).

There are three methods for (optionally) encoding conditioning events via the evidence argument. The first is to provide a partial sample, where some but not all columns from the training data are present. The second is to provide a data frame with three columns: variable, relation, and value. This supports inequalities via relation. Alternatively, users may directly input a pre-calculated posterior distribution over leaves, with columns f_idx and wt. This may be preferable for complex constraints. See Examples.

References

Watson, D., Blesch, K., Kapar, J., & Wright, M. (2023). Adversarial random forests for density estimation and generative modeling. In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, pp. 5357-5375.

See also

Examples

arf <- adversarial_rf(iris)
#> Iteration: 0, Accuracy: 83.16%
#> Iteration: 1, Accuracy: 42.62%
psi <- forde(arf, iris)
x_synth <- forge(psi, n_synth = 100)

# Condition on Species = "setosa"
evi <- data.frame(Species = "setosa")
x_synth <- forge(psi, n_synth = 100, evidence = evi)

# Condition in Species = "setosa" and Sepal.Length > 6
evi <- data.frame(variable = c("Species", "Sepal.Length"),
                  relation = c("==", ">"), 
                  value = c("setosa", 6))
x_synth <- forge(psi, n_synth = 100, evidence = evi)

# Or just input some distribution on leaves
# (Weights that do not sum to unity are automatically scaled)
n_leaves <- nrow(psi$forest)
evi <- data.frame(f_idx = psi$forest$f_idx, wt = rexp(n_leaves))
x_synth <- forge(psi, n_synth = 100, evidence = evi)
#> Warning: Posterior weights have been normalized to sum to unity.