Twomoons data example

This is an example that illustrates how to synthesize data with arfpy.

[1]:
from sklearn.datasets import make_moons
from arfpy import arf
import pandas as pd
from matplotlib import pyplot as plt
import warnings
warnings.filterwarnings('ignore')
from numpy import random
random.seed(seed=2022)

First, we load the data:

[2]:
n_train = 2000
n_test = 1000
moons_X, moons_y = make_moons(n_samples = n_train+n_test,  noise=0.1)
df = pd.DataFrame({"dim_1" : moons_X[:,0], "dim_2" : moons_X[:,1], "target" : moons_y})
df['target'] = df['target'].astype('category')

Now we can define the ARF object:

[3]:
my_arf = arf.arf(x = df[:n_train], min_node_size=2, num_trees=20, max_features = 2)

Initial accuracy is 0.8125
Iteration number 1 reached accuracy of 0.3215.

Next, we estimate the density:

[4]:
FORDE = my_arf.forde()

Finally, we can generate some new data:

[5]:
df_syn = my_arf.forge(n = n_test)

Let’s invesitgate whether our synthesized data looks realistic!

[21]:
plt.subplot(2, 2, 1)
df_test = df[:n_train].sample(n=n_test)
plt.scatter(df_test['dim_1'], df_test['dim_2'], c = df_test['target'], alpha = 0.5)
plt.title('Original Data', fontsize = 30)

plt.subplot(2, 2, 2)
plt.scatter(df_syn['dim_1'], df_syn['dim_2'], c = df_syn['target'], alpha = 0.5)
plt.title('Synthesized Data', fontsize = 30)

plt.rcParams['figure.figsize'] = [30, 20]
plt.show()
../_images/examples_twomoons_11_0.png