openasce.inference.tree package¶

class openasce.inference.tree.DifferenceInDifferencesRegressionTree(n_estimators=100, learning_rate=0.1, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, parallel_l2=0, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: ConfigTree = None, bin_mapper: BinMapper = None)[source]¶

Bases: Boosting

Gradient Boosting debiased Causal Tree for regression, Difference-in-differences meets tree-based methods: heterogeneous treatment effects estimation with unmeasured confounding

Reference: 1. Tang, C., Wang, H., Li, X., Qing, C., Li, L., & Zhou, J. (2023). Difference-in-Differences Meets Tree-Based Methods: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Proceedings of the 40th International Conference on Machine Learning.

Parameters

n_estimators – int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
learning_rate – float, default=0.1 learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.
subsample – float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.
subfeature – float, default=1.0 The fraction of feature to be used for fitting the individual base learners. Referring to random forest.
train_loss – string, default=openasce.inference.tree.losses.MeanSquaredError. The spliting criterion used by GBCT. User can reimplement by inheriting base class Loss.
min_samples_split – int, default=10 The minimum number of samples required to split an internal node
max_depth – int, default=3 maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.
lambd – float, default=5 The regularization parameter of l2_penalty reference to XGB.
coef – float, default=1 The regularization parameter of selection bias, which refer to GBCT.
parallel_l2 – float, default=0
n_period (int, optional) – The number of timesteps. It’s required to be provied by user.
treat_dt (int, optional) – The time step that treatment is assigned, which is less than n_period. It’s required to be provied by user.
init – The initial prediction. Default to 0.
random_state – int or RandomState, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details).
verbose – bool, default=False. Enable verbose output.
n_iter_no_change – int, default=40 n_iter_no_change is used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping.
tol – float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops.
nthreads – The number of threads to use for parallelization. Default is 32.
conf (ConfigTree, optional) – _description_. Defaults to None.
bin_mapper (BinMapper, optional) – _description_. Defaults to None.

__doc__ = "\n Gradient Boosting debiased Causal Tree for regression,\n Difference-in-differences meets tree-based methods: heterogeneous treatment effects estimation with unmeasured confounding\n\n Reference:\n 1. Tang, C., Wang, H., Li, X., Qing, C., Li, L., & Zhou, J. (2023). Difference-in-Differences Meets Tree-Based Methods: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Proceedings of the 40th International Conference on Machine Learning.\n\n\n Arguments:\n n_estimators : int, default=100\n The number of boosting stages to perform. Gradient boosting\n is fairly robust to over-fitting so a large number usually\n results in better performance.\n\n learning_rate : float, default=0.1\n learning rate shrinks the contribution of each tree by `learning_rate`.\n There is a trade-off between learning_rate and n_estimators.\n\n subsample : float, default=1.0\n The fraction of samples to be used for fitting the individual base\n learners. If smaller than 1.0 this results in Stochastic Gradient\n Boosting. `subsample` interacts with the parameter `n_estimators`.\n Choosing `subsample < 1.0` leads to a reduction of variance\n and an increase in bias.\n\n subfeature : float, default=1.0\n The fraction of feature to be used for fitting the individual base\n learners. Referring to random forest.\n\n train_loss : string, default=openasce.inference.tree.losses.MeanSquaredError.\n The spliting criterion used by GBCT. User can reimplement by inheriting base class `Loss`.\n\n min_samples_split : int, default=10\n The minimum number of samples required to split an internal node\n\n max_depth : int, default=3\n maximum depth of the individual regression estimators. The maximum\n depth limits the number of nodes in the tree. Tune this parameter\n for best performance; the best value depends on the interaction\n of the input variables.\n\n lambd : float, default=5\n The regularization parameter of l2_penalty reference to XGB.\n\n coef : float, default=1\n The regularization parameter of selection bias, which refer to GBCT.\n\n parallel_l2: float, default=0\n\n n_period (int, optional): The number of timesteps. It's required to be provied by user.\n\n treat_dt (int, optional): The time step that treatment is assigned, which is less than n_period. It's\n required to be provied by user.\n\n init : The initial prediction. Default to 0.\n\n random_state : int or RandomState, default=None\n Controls the random seed given to each Tree estimator at each\n boosting iteration.\n In addition, it controls the random permutation of the features at\n each split (see Notes for more details).\n\n verbose : bool, default=False. Enable verbose output.\n\n n_iter_no_change : int, default=40\n ``n_iter_no_change`` is used to decide if early stopping will be used\n to terminate training when validation score is not improving. By\n default it is set to None to disable early stopping.\n\n tol : float, default=1e-4\n Tolerance for the early stopping. When the loss is not improving\n by at least tol for ``n_iter_no_change`` iterations (if set to a\n number), the training stops.\n\n nthreads: The number of threads to use for parallelization. Default is 32.\n\n conf (ConfigTree, optional): _description_. Defaults to None.\n\n bin_mapper (BinMapper, optional): _description_. Defaults to None.\n "¶

__init__(n_estimators=100, learning_rate=0.1, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, parallel_l2=0, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: ConfigTree = None, bin_mapper: BinMapper = None)[source]¶

__module__ = 'openasce.inference.tree.didtree'¶

effect(X: ndarray = None, *, data: Dataset = None)[source]¶

predict the treatment effect.

Parameters

X – np.ndarray, [n_instances, n_features]. features
data – Dataset, optional. Defaults to None.

Returns

np.ndarray, [n_instances,]. treatment effect.

fit(X: ndarray = None, Y: ndarray = None, D: ndarray = None, *, data: Dataset = None, data_test: Dataset = None)[source]¶

train the GradientBoostingUpliftTree

Parameters

X – np.ndarray, [n_instances, n_features]. features
Y – np.ndarray, [n_instances, n_period]. outcome
D – np.ndarray, [n_instances,]. treatment.
data (Dataset, optional) – _description_. Defaults to None.

Returns:

predict(X: ndarray = None, key: str = 'effect', *, data: Dataset = None)[source]¶

predict the treatment effect or leaf id, which is determined by the parameter key.

Parameters

X (np.ndarray) – features
key (str) – the key can be ‘effect’, ‘leaf_id’ or ‘outcome’.
data (Dataset, optional) – dataset. Defaults to None.

Returns

_description_

Return type

_type_

class openasce.inference.tree.GradientBoostingCausalRegressionTree(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]¶

Bases: Boosting

Gradient Boosting debiased Causal Tree for regression, This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.

Reference: 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information Processing Systems 36, 16.

Parameters

learning_rate – float, default=0.1 learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.
n_estimators – int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
subsample – float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.
subfeature – float, default=1.0 The fraction of feature to be used for fitting the individual base learners. Referring to random forest.
train_loss – string, default=openasce.inference.tree.losses.MeanSquaredError. The spliting criterion used by GBCT. User can reimplement by inheriting base class Loss.
min_samples_split – int, default=10 The minimum number of samples required to split an internal node
max_depth – int, default=3 maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.
lambd – float, default=5 The regularization parameter of l2_penalty reference to XGB.
coef – float, default=1 The regularization parameter of selection bias, which refer to GBCT.
n_period (int, optional) – The number of timesteps. It’s required to be provied by user.
treat_dt (int, optional) – The time step that treatment is assigned, which is less than n_period. It’s required to be provied by user.
init – The initial prediction. Default to 0.
random_state – int or RandomState, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details).
verbose – bool, default=False. Enable verbose output.
n_iter_no_change – int, default=10 n_iter_no_change is used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping.
tol – float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops.
nthreads – The number of threads to use for parallelization. Default is 32.
conf (ConfigTree or dict, optional) – _description_. Defaults to None.
bin_mapper (BinMapper, optional) – _description_. Defaults to None.

__annotations__ = {}¶

__doc__ = "\n Gradient Boosting debiased Causal Tree for regression,\n This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.\n\n Reference:\n 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree:\n Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information\n Processing Systems 36, 16.\n\n Arguments:\n learning_rate : float, default=0.1\n learning rate shrinks the contribution of each tree by `learning_rate`.\n There is a trade-off between learning_rate and n_estimators.\n\n\n n_estimators : int, default=100\n The number of boosting stages to perform. Gradient boosting\n is fairly robust to over-fitting so a large number usually\n results in better performance.\n\n subsample : float, default=1.0\n The fraction of samples to be used for fitting the individual base\n learners. If smaller than 1.0 this results in Stochastic Gradient\n Boosting. `subsample` interacts with the parameter `n_estimators`.\n Choosing `subsample < 1.0` leads to a reduction of variance\n and an increase in bias.\n\n subfeature : float, default=1.0\n The fraction of feature to be used for fitting the individual base\n learners. Referring to random forest.\n\n train_loss : string, default=openasce.inference.tree.losses.MeanSquaredError.\n The spliting criterion used by GBCT. User can reimplement by inheriting base class `Loss`.\n\n min_samples_split : int, default=10\n The minimum number of samples required to split an internal node\n\n max_depth : int, default=3\n maximum depth of the individual regression estimators. The maximum\n depth limits the number of nodes in the tree. Tune this parameter\n for best performance; the best value depends on the interaction\n of the input variables.\n\n lambd : float, default=5\n The regularization parameter of l2_penalty reference to XGB.\n\n coef : float, default=1\n The regularization parameter of selection bias, which refer to GBCT.\n\n n_period (int, optional): The number of timesteps. It's required to be provied by user.\n\n treat_dt (int, optional): The time step that treatment is assigned, which is less than n_period. It's\n required to be provied by user.\n\n init : The initial prediction. Default to 0.\n\n random_state : int or RandomState, default=None\n Controls the random seed given to each Tree estimator at each\n boosting iteration.\n In addition, it controls the random permutation of the features at\n each split (see Notes for more details).\n\n verbose : bool, default=False. Enable verbose output.\n\n n_iter_no_change : int, default=10\n ``n_iter_no_change`` is used to decide if early stopping will be used\n to terminate training when validation score is not improving. By\n default it is set to None to disable early stopping.\n\n tol : float, default=1e-4\n Tolerance for the early stopping. When the loss is not improving\n by at least tol for ``n_iter_no_change`` iterations (if set to a\n number), the training stops.\n\n nthreads: The number of threads to use for parallelization. Default is 32.\n\n conf (ConfigTree or dict, optional): _description_. Defaults to None.\n\n bin_mapper (BinMapper, optional): _description_. Defaults to None.\n "¶

__init__(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]¶

__module__ = 'openasce.inference.tree.gbct'¶

effect(X: ndarray, *, data: Dataset = None)[source]¶

predict the treatment effect.

Parameters

X – np.ndarray, [n_instances, n_features]. features
data – Dataset, optional. Defaults to None.

Returns

np.ndarray, [n_instances,]. treatment effect.

fit(X: ndarray, Y: ndarray, D: ndarray, *, data: Dataset = None)[source]¶

train the GradientBoostingUpliftTree

Parameters

X – np.ndarray, [n_instances, n_features]. features
Y – np.ndarray, [n_instances, n_period]. outcome
D – np.ndarray, [n_instances,]. treatment.
data (Dataset, optional) – _description_. Defaults to None.

Returns:

predict(X: ndarray, key: str, *, data: Dataset = None)[source]¶

predict the treatment effect or leaf id, which is determined by the parameter key.

Parameters

X (np.ndarray) – features
key (str) – the key can be ‘effect’, ‘leaf_id’ or ‘effect-ND’.
data (Dataset, optional) – dataset. Defaults to None.

Returns

_description_

Return type

_type_

class openasce.inference.tree.GradientBoostingUpliftTree(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.BinaryCrossEntropy', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]¶

Bases: Boosting

Gradient Boosting debiased Causal Tree for classification, This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.

Reference: 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information Processing Systems 36, 16.

Parameters

learning_rate – float, default=0.1 learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.
n_estimators – int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
subsample – float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.
subfeature – float, default=1.0 The fraction of feature to be used for fitting the individual base learners. Referring to random forest.
train_loss – string, default=openasce.inference.tree.losses.BinaryCrossEntropy. The spliting criterion used by GBCT. User can reimplement by inheriting base class Loss.
min_samples_split – int, default=10 The minimum number of samples required to split an internal node
max_depth – int, default=3 maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.
lambd – float, default=5 The regularization parameter of l2_penalty reference to XGB.
coef – float, default=1 The regularization parameter of selection bias, which refer to GBCT.
n_period (int, optional) – The number of timesteps. It’s required to be provied by user.
treat_dt (int, optional) – The time step that treatment is assigned, which is less than n_period. It’s required to be provied by user.
init – The initial prediction. Default to 0.
random_state – int or RandomState, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details).
verbose – bool, default=False. Enable verbose output.
n_iter_no_change – int, default=10 n_iter_no_change is used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping.
tol – float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops.
nthreads – The number of threads to use for parallelization. Default is 32.
conf (ConfigTree or dict, optional) – _description_. Defaults to None.
bin_mapper (BinMapper, optional) – _description_. Defaults to None.

__annotations__ = {}¶

__doc__ = "\n Gradient Boosting debiased Causal Tree for classification,\n This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.\n\n Reference:\n 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree:\n Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information\n Processing Systems 36, 16.\n\n Arguments:\n learning_rate : float, default=0.1\n learning rate shrinks the contribution of each tree by `learning_rate`.\n There is a trade-off between learning_rate and n_estimators.\n\n\n n_estimators : int, default=100\n The number of boosting stages to perform. Gradient boosting\n is fairly robust to over-fitting so a large number usually\n results in better performance.\n\n subsample : float, default=1.0\n The fraction of samples to be used for fitting the individual base\n learners. If smaller than 1.0 this results in Stochastic Gradient\n Boosting. `subsample` interacts with the parameter `n_estimators`.\n Choosing `subsample < 1.0` leads to a reduction of variance\n and an increase in bias.\n\n subfeature : float, default=1.0\n The fraction of feature to be used for fitting the individual base\n learners. Referring to random forest.\n\n train_loss : string, default=openasce.inference.tree.losses.BinaryCrossEntropy.\n The spliting criterion used by GBCT. User can reimplement by inheriting base class `Loss`.\n\n min_samples_split : int, default=10\n The minimum number of samples required to split an internal node\n\n max_depth : int, default=3\n maximum depth of the individual regression estimators. The maximum\n depth limits the number of nodes in the tree. Tune this parameter\n for best performance; the best value depends on the interaction\n of the input variables.\n\n lambd : float, default=5\n The regularization parameter of l2_penalty reference to XGB.\n\n coef : float, default=1\n The regularization parameter of selection bias, which refer to GBCT.\n\n n_period (int, optional): The number of timesteps. It's required to be provied by user.\n\n treat_dt (int, optional): The time step that treatment is assigned, which is less than n_period. It's\n required to be provied by user.\n\n init : The initial prediction. Default to 0.\n\n random_state : int or RandomState, default=None\n Controls the random seed given to each Tree estimator at each\n boosting iteration.\n In addition, it controls the random permutation of the features at\n each split (see Notes for more details).\n\n verbose : bool, default=False. Enable verbose output.\n\n n_iter_no_change : int, default=10\n ``n_iter_no_change`` is used to decide if early stopping will be used\n to terminate training when validation score is not improving. By\n default it is set to None to disable early stopping.\n\n tol : float, default=1e-4\n Tolerance for the early stopping. When the loss is not improving\n by at least tol for ``n_iter_no_change`` iterations (if set to a\n number), the training stops.\n\n nthreads: The number of threads to use for parallelization. Default is 32.\n\n conf (ConfigTree or dict, optional): _description_. Defaults to None.\n\n bin_mapper (BinMapper, optional): _description_. Defaults to None.\n "¶

__init__(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.BinaryCrossEntropy', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]¶

__module__ = 'openasce.inference.tree.gbct'¶

effect(X: ndarray, *, data: Dataset = None)[source]¶

predict the treatment effect.

Parameters

X – np.ndarray, [n_instances, n_features]. features
data – Dataset, optional. Defaults to None.

Returns

np.ndarray, [n_instances,]. treatment effect.

fit(X: ndarray, Y: ndarray, D: ndarray, *, data: Dataset = None)[source]¶

train the GradientBoostingUpliftTree

Parameters

X – np.ndarray, [n_instances, n_features]. features
Y – np.ndarray, [n_instances, n_period]. outcome
D – np.ndarray, [n_instances]. treatment.
data (Dataset, optional) – _description_. Defaults to None.

Returns:

predict(X: ndarray, key: str, *, data: Dataset = None)[source]¶

predict the treatment effect or leaf id, which is determined by the parameter key.

Parameters

X (np.ndarray) – features
key (str) – the key can be ‘effect’, ‘leaf_id’ or ‘effect-ND’.
data (Dataset, optional) – data. Defaults to None.

Returns

_description_

Return type

_type_

Submodules¶

openasce.inference.tree.bin module¶

class openasce.inference.tree.bin.BinMapper(conf: ConfigTree)[source]¶

Bases: KBinsDiscretizer

A class for binning numerical features.

__annotations__ = {}¶

__doc__ = 'A class for binning numerical features.'¶

__init__(conf: ConfigTree)[source]¶

__module__ = 'openasce.inference.tree.bin'¶

_sklearn_auto_wrap_output_keys = {'transform'}¶

description()[source]¶: Print the description of the bin mapper.

fit(X, y=None)[source]¶

Fit the bin mapper on the input features.

Parameters

X – Input features.
y – The target variable (not used).

Returns

The fitted bin mapper object.

fit_dataset(data)[source]¶

Fit the bin mapper on the dataset.

Parameters: data – Dataset object containing the input features.

fit_transform(X, y=None, **fit_params)[source]¶

Fit the bin mapper on the input features and transform them.

Parameters

X – Input features.
y – The target variable (not used).
fit_params – Additional parameters for fitting.

Returns

The transformed features.

inverse_transform(Xt, index: int = None)[source]¶

Inverse transform the transformed features to the original values.

Parameters

Xt – Transformed features.
index – Index of the feature to inverse transform.

Returns

The inverse transformed features.

property is_fit¶

Check if the bin mapper is fit.

Returns: True if the bin mapper is fit, False otherwise.

transform(X)[source]¶

Transform the input features using the bin mapper.

Parameters: X – Input features.
Returns: The transformed features.

property upper_bounds¶

Get the upper bounds of the bins.

Returns: The upper bounds of the bins.

openasce.inference.tree.bin_test module¶

openasce.inference.tree.boosting module¶

class openasce.inference.tree.boosting.Boosting(tree_cls, conf: ConfigTree, bin_mapper: BinMapper = None)[source]¶

Bases: object

__annotations__ = {}¶

__dict__ = mappingproxy({'__module__': 'openasce.inference.tree.boosting', '__init__': <function Boosting.__init__>, 'fit': <function Boosting.fit>, 'preprocess': <function Boosting.preprocess>, 'check_data': <function Boosting.check_data>, 'tr_val': <function Boosting.tr_val>, 'postprocess': <function Boosting.postprocess>, '_validation': <function Boosting._validation>, '_update_paramers': <function Boosting._update_paramers>, 'early_stopping': <function Boosting.early_stopping>, 'predict': <function Boosting.predict>, 'effect': <function Boosting.effect>, 'split_counts': <function Boosting.split_counts>, '__dict__': <attribute '__dict__' of 'Boosting' objects>, '__weakref__': <attribute '__weakref__' of 'Boosting' objects>, '__doc__': None, '__annotations__': {}})¶

__doc__ = None¶

__init__(tree_cls, conf: ConfigTree, bin_mapper: BinMapper = None)[source]¶

__module__ = 'openasce.inference.tree.boosting'¶

__weakref__¶: list of weak references to the object (if defined)

_update_paramers(*args, **kwargs)[source]¶

_validation(target, prediction, cprediction=None)[source]¶

check_data(data: Dataset)[source]¶

early_stopping() → bool[source]¶

Check if early stopping criteria is met based on the validation losses.

Returns: True if early stopping criteria is met, False otherwise.

effect(X=None, *, data: Dataset = None)[source]¶: Predict the treatment effect on the input data.

fit(data: Dataset)[source]¶

Fit the causal forest model on the provided dataset.

Parameters: data – Dataset object containing the input features, targets, and treatment.

postprocess()[source]¶

predict(X, key: str, *, data: Dataset = None)[source]¶

Predict the output using the trained model on the input data.

Parameters

X – Feature matrix of the input data.
key – Type of prediction, can be ‘leaf_id’, ‘effect’, or ‘effect-ND’.
data – Dataset object containing the feature data.

Returns

Prediction result based on the specified key.

Raises

RuntimeError – If the specified key is unknown and not supported.

preprocess(data: Dataset)[source]¶

Perform preprocessing steps on the provided dataset.

Parameters: data – Dataset object containing the input features, targets, and treatment.

split_counts(trees=None, feature_names=None)[source]¶

Count the number of splits made on each feature in the gradient boost causal trees.

Parameters

trees – List of decision trees. If None, uses the trained trees.
feature_names – List of feature names. If None, uses the feature columns from conf.

Returns

Dictionary with feature names as keys and the corresponding split counts as values.

tr_val(data: Dataset, subsample=None, **kwargs)[source]¶

Split the dataset into training and validation sets.

Parameters

data – Dataset object containing the input features, targets, and treatment.
subsample – Ratio of instances to include in the training set. If None, uses the instance_ratio from conf.

Returns

Tuple containing the training dataset, validation dataset, indices of training instances, and indices of validation instances.

openasce.inference.tree.cppnode module¶

openasce.inference.tree.cppnode.create_didnode_from_dict(info)[source]¶

Create a CppDebiasNode from a dictionary.

Parameters: info (Dict) – The node information.
Returns: The CppDebiasNode instance.
Return type: CppDebiasNode

openasce.inference.tree.cppnode.predict(nodes: List[openasce.inference.tree.gbct_utils.common.CppDiDNode], x, out, key, threads=20)[source]¶

Predict using the tree nodes.

Parameters

nodes (List) – The list of tree nodes.
x (ndarray) – The input data.
out (ndarray) – The output array.
key (ndarray) – The prediction key.
threads (int) – The number of threads.

Returns

The predicted values.

Return type

ndarray

Raises

RuntimeError – If the number of nodes is less than or equal to 0.
ValueError – If the node type is not supported.

openasce.inference.tree.cppnode_test module¶

openasce.inference.tree.csv_dataset module¶

class openasce.inference.tree.csv_dataset.CsvDataset(conf=None, **kwargs)[source]¶

Bases: Dataset

A Dataset interface for loading csv data

__doc__ = 'A Dataset interface for loading csv data'¶

__init__(conf=None, **kwargs)[source]¶

__module__ = 'openasce.inference.tree.csv_dataset'¶

property features¶

static new_instance(conf)[source]¶

read(filename=None)[source]¶

sub_dataset(index=None, cols=None, cols_y=[]) → Dataset[source]¶

Abstract interface of sub-sampling

Parameters: index (_type_, optional) – _description_. Defaults to None.
Raises: NotImplementedError – _description_

property targets¶

property treatment¶

property weight¶

openasce.inference.tree.dataset module¶

class openasce.inference.tree.dataset.Dataset[source]¶

Bases: object

Abstract interface of class dataset

__annotations__ = {}¶

__dict__ = mappingproxy({'__module__': 'openasce.inference.tree.dataset', '__doc__': 'Abstract interface of class dataset', '__init__': <function Dataset.__init__>, '__len__': <function Dataset.__len__>, 'new_instance': <staticmethod(<function Dataset.new_instance>)>, 'read': <function Dataset.read>, 'sub_dataset': <function Dataset.sub_dataset>, 'description': <function Dataset.description>, 'targets': <property object>, 'features': <property object>, 'treatment': <property object>, 'feature_columns': <property object>, '__dict__': <attribute '__dict__' of 'Dataset' objects>, '__weakref__': <attribute '__weakref__' of 'Dataset' objects>, '__annotations__': {}})¶

__doc__ = 'Abstract interface of class dataset'¶

__init__()[source]¶

__len__()[source]¶

__module__ = 'openasce.inference.tree.dataset'¶

__weakref__¶: list of weak references to the object (if defined)

description(detail: bool = False) → None[source]¶

description the dataset

Parameters: detail (bool, optional) – [description]. Defaults to False.

property feature_columns¶

property features¶

static new_instance(conf)[source]¶

read(filename)[source]¶

sub_dataset(index=None)[source]¶

Abstract interface of sub-sampling

Parameters: index (_type_, optional) – _description_. Defaults to None.
Raises: NotImplementedError – _description_

property targets¶

property treatment¶

class openasce.inference.tree.dataset.PsudoDataset(features: ndarray = None, outcome: ndarray = None, treatment: ndarray = None, conf=None)[source]¶

Bases: Dataset

A Psudo Dataset to wrap for the numpy formatting data.

Parameters

features (np.ndarray, optional) – features. Defaults to None.
outcome (np.ndarray, optional) – outcome. Defaults to None.
treatment (np.ndarray, optional) – treatment. Defaults to None.
conf (_type_, optional) – configure. Defaults to None.

__annotations__ = {}¶

__doc__ = '\n A Psudo Dataset to wrap for the numpy formatting data.\n\n Arguments:\n features (np.ndarray, optional): features. Defaults to None.\n outcome (np.ndarray, optional): outcome. Defaults to None.\n treatment (np.ndarray, optional): treatment. Defaults to None.\n conf (_type_, optional): configure. Defaults to None.\n '¶

__init__(features: ndarray = None, outcome: ndarray = None, treatment: ndarray = None, conf=None)[source]¶

__module__ = 'openasce.inference.tree.dataset'¶

property feature_columns¶

property features¶

sub_dataset(index=None, cols=None) → Dataset[source]¶

Create a sub-dataset.

Parameters

index – Indices of the samples to include in the sub-dataset.
cols – Columns to include in the sub-dataset.

Returns

The sub-dataset.

property targets¶

property treatment¶

property weight¶

openasce.inference.tree.didtree module¶

class openasce.inference.tree.didtree.DifferenceInDifferencesRegressionTree(n_estimators=100, learning_rate=0.1, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, parallel_l2=0, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: ConfigTree = None, bin_mapper: BinMapper = None)[source]¶

Bases: Boosting

Gradient Boosting debiased Causal Tree for regression, Difference-in-differences meets tree-based methods: heterogeneous treatment effects estimation with unmeasured confounding

Reference: 1. Tang, C., Wang, H., Li, X., Qing, C., Li, L., & Zhou, J. (2023). Difference-in-Differences Meets Tree-Based Methods: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Proceedings of the 40th International Conference on Machine Learning.

Parameters

n_estimators – int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
learning_rate – float, default=0.1 learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.
subsample – float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.
subfeature – float, default=1.0 The fraction of feature to be used for fitting the individual base learners. Referring to random forest.
train_loss – string, default=openasce.inference.tree.losses.MeanSquaredError. The spliting criterion used by GBCT. User can reimplement by inheriting base class Loss.
min_samples_split – int, default=10 The minimum number of samples required to split an internal node
max_depth – int, default=3 maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.
lambd – float, default=5 The regularization parameter of l2_penalty reference to XGB.
coef – float, default=1 The regularization parameter of selection bias, which refer to GBCT.
parallel_l2 – float, default=0
n_period (int, optional) – The number of timesteps. It’s required to be provied by user.
treat_dt (int, optional) – The time step that treatment is assigned, which is less than n_period. It’s required to be provied by user.
init – The initial prediction. Default to 0.
random_state – int or RandomState, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details).
verbose – bool, default=False. Enable verbose output.
n_iter_no_change – int, default=40 n_iter_no_change is used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping.
tol – float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops.
nthreads – The number of threads to use for parallelization. Default is 32.
conf (ConfigTree, optional) – _description_. Defaults to None.
bin_mapper (BinMapper, optional) – _description_. Defaults to None.

__annotations__ = {}¶

__doc__ = "\n Gradient Boosting debiased Causal Tree for regression,\n Difference-in-differences meets tree-based methods: heterogeneous treatment effects estimation with unmeasured confounding\n\n Reference:\n 1. Tang, C., Wang, H., Li, X., Qing, C., Li, L., & Zhou, J. (2023). Difference-in-Differences Meets Tree-Based Methods: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Proceedings of the 40th International Conference on Machine Learning.\n\n\n Arguments:\n n_estimators : int, default=100\n The number of boosting stages to perform. Gradient boosting\n is fairly robust to over-fitting so a large number usually\n results in better performance.\n\n learning_rate : float, default=0.1\n learning rate shrinks the contribution of each tree by `learning_rate`.\n There is a trade-off between learning_rate and n_estimators.\n\n subsample : float, default=1.0\n The fraction of samples to be used for fitting the individual base\n learners. If smaller than 1.0 this results in Stochastic Gradient\n Boosting. `subsample` interacts with the parameter `n_estimators`.\n Choosing `subsample < 1.0` leads to a reduction of variance\n and an increase in bias.\n\n subfeature : float, default=1.0\n The fraction of feature to be used for fitting the individual base\n learners. Referring to random forest.\n\n train_loss : string, default=openasce.inference.tree.losses.MeanSquaredError.\n The spliting criterion used by GBCT. User can reimplement by inheriting base class `Loss`.\n\n min_samples_split : int, default=10\n The minimum number of samples required to split an internal node\n\n max_depth : int, default=3\n maximum depth of the individual regression estimators. The maximum\n depth limits the number of nodes in the tree. Tune this parameter\n for best performance; the best value depends on the interaction\n of the input variables.\n\n lambd : float, default=5\n The regularization parameter of l2_penalty reference to XGB.\n\n coef : float, default=1\n The regularization parameter of selection bias, which refer to GBCT.\n\n parallel_l2: float, default=0\n\n n_period (int, optional): The number of timesteps. It's required to be provied by user.\n\n treat_dt (int, optional): The time step that treatment is assigned, which is less than n_period. It's\n required to be provied by user.\n\n init : The initial prediction. Default to 0.\n\n random_state : int or RandomState, default=None\n Controls the random seed given to each Tree estimator at each\n boosting iteration.\n In addition, it controls the random permutation of the features at\n each split (see Notes for more details).\n\n verbose : bool, default=False. Enable verbose output.\n\n n_iter_no_change : int, default=40\n ``n_iter_no_change`` is used to decide if early stopping will be used\n to terminate training when validation score is not improving. By\n default it is set to None to disable early stopping.\n\n tol : float, default=1e-4\n Tolerance for the early stopping. When the loss is not improving\n by at least tol for ``n_iter_no_change`` iterations (if set to a\n number), the training stops.\n\n nthreads: The number of threads to use for parallelization. Default is 32.\n\n conf (ConfigTree, optional): _description_. Defaults to None.\n\n bin_mapper (BinMapper, optional): _description_. Defaults to None.\n "¶

__init__(n_estimators=100, learning_rate=0.1, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, parallel_l2=0, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: ConfigTree = None, bin_mapper: BinMapper = None)[source]¶

__module__ = 'openasce.inference.tree.didtree'¶

effect(X: ndarray = None, *, data: Dataset = None)[source]¶

predict the treatment effect.

Parameters

X – np.ndarray, [n_instances, n_features]. features
data – Dataset, optional. Defaults to None.

Returns

np.ndarray, [n_instances,]. treatment effect.

fit(X: ndarray = None, Y: ndarray = None, D: ndarray = None, *, data: Dataset = None, data_test: Dataset = None)[source]¶

train the GradientBoostingUpliftTree

Parameters

X – np.ndarray, [n_instances, n_features]. features
Y – np.ndarray, [n_instances, n_period]. outcome
D – np.ndarray, [n_instances,]. treatment.
data (Dataset, optional) – _description_. Defaults to None.

Returns:

predict(X: ndarray = None, key: str = 'effect', *, data: Dataset = None)[source]¶

predict the treatment effect or leaf id, which is determined by the parameter key.

Parameters

X (np.ndarray) – features
key (str) – the key can be ‘effect’, ‘leaf_id’ or ‘outcome’.
data (Dataset, optional) – dataset. Defaults to None.

Returns

_description_

Return type

_type_

openasce.inference.tree.gbct module¶

class openasce.inference.tree.gbct.GradientBoostingCausalRegressionTree(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]¶

Bases: Boosting

Gradient Boosting debiased Causal Tree for regression, This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.

Reference: 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information Processing Systems 36, 16.

Parameters

learning_rate – float, default=0.1 learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.
n_estimators – int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
subsample – float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.
subfeature – float, default=1.0 The fraction of feature to be used for fitting the individual base learners. Referring to random forest.
train_loss – string, default=openasce.inference.tree.losses.MeanSquaredError. The spliting criterion used by GBCT. User can reimplement by inheriting base class Loss.
min_samples_split – int, default=10 The minimum number of samples required to split an internal node
max_depth – int, default=3 maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.
lambd – float, default=5 The regularization parameter of l2_penalty reference to XGB.
coef – float, default=1 The regularization parameter of selection bias, which refer to GBCT.
n_period (int, optional) – The number of timesteps. It’s required to be provied by user.
treat_dt (int, optional) – The time step that treatment is assigned, which is less than n_period. It’s required to be provied by user.
init – The initial prediction. Default to 0.
random_state – int or RandomState, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details).
verbose – bool, default=False. Enable verbose output.
n_iter_no_change – int, default=10 n_iter_no_change is used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping.
tol – float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops.
nthreads – The number of threads to use for parallelization. Default is 32.
conf (ConfigTree or dict, optional) – _description_. Defaults to None.
bin_mapper (BinMapper, optional) – _description_. Defaults to None.

__annotations__ = {}¶

__doc__ = "\n Gradient Boosting debiased Causal Tree for regression,\n This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.\n\n Reference:\n 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree:\n Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information\n Processing Systems 36, 16.\n\n Arguments:\n learning_rate : float, default=0.1\n learning rate shrinks the contribution of each tree by `learning_rate`.\n There is a trade-off between learning_rate and n_estimators.\n\n\n n_estimators : int, default=100\n The number of boosting stages to perform. Gradient boosting\n is fairly robust to over-fitting so a large number usually\n results in better performance.\n\n subsample : float, default=1.0\n The fraction of samples to be used for fitting the individual base\n learners. If smaller than 1.0 this results in Stochastic Gradient\n Boosting. `subsample` interacts with the parameter `n_estimators`.\n Choosing `subsample < 1.0` leads to a reduction of variance\n and an increase in bias.\n\n subfeature : float, default=1.0\n The fraction of feature to be used for fitting the individual base\n learners. Referring to random forest.\n\n train_loss : string, default=openasce.inference.tree.losses.MeanSquaredError.\n The spliting criterion used by GBCT. User can reimplement by inheriting base class `Loss`.\n\n min_samples_split : int, default=10\n The minimum number of samples required to split an internal node\n\n max_depth : int, default=3\n maximum depth of the individual regression estimators. The maximum\n depth limits the number of nodes in the tree. Tune this parameter\n for best performance; the best value depends on the interaction\n of the input variables.\n\n lambd : float, default=5\n The regularization parameter of l2_penalty reference to XGB.\n\n coef : float, default=1\n The regularization parameter of selection bias, which refer to GBCT.\n\n n_period (int, optional): The number of timesteps. It's required to be provied by user.\n\n treat_dt (int, optional): The time step that treatment is assigned, which is less than n_period. It's\n required to be provied by user.\n\n init : The initial prediction. Default to 0.\n\n random_state : int or RandomState, default=None\n Controls the random seed given to each Tree estimator at each\n boosting iteration.\n In addition, it controls the random permutation of the features at\n each split (see Notes for more details).\n\n verbose : bool, default=False. Enable verbose output.\n\n n_iter_no_change : int, default=10\n ``n_iter_no_change`` is used to decide if early stopping will be used\n to terminate training when validation score is not improving. By\n default it is set to None to disable early stopping.\n\n tol : float, default=1e-4\n Tolerance for the early stopping. When the loss is not improving\n by at least tol for ``n_iter_no_change`` iterations (if set to a\n number), the training stops.\n\n nthreads: The number of threads to use for parallelization. Default is 32.\n\n conf (ConfigTree or dict, optional): _description_. Defaults to None.\n\n bin_mapper (BinMapper, optional): _description_. Defaults to None.\n "¶

__init__(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]¶

__module__ = 'openasce.inference.tree.gbct'¶

effect(X: ndarray, *, data: Dataset = None)[source]¶

predict the treatment effect.

Parameters

X – np.ndarray, [n_instances, n_features]. features
data – Dataset, optional. Defaults to None.

Returns

np.ndarray, [n_instances,]. treatment effect.

fit(X: ndarray, Y: ndarray, D: ndarray, *, data: Dataset = None)[source]¶

train the GradientBoostingUpliftTree

Parameters

X – np.ndarray, [n_instances, n_features]. features
Y – np.ndarray, [n_instances, n_period]. outcome
D – np.ndarray, [n_instances,]. treatment.
data (Dataset, optional) – _description_. Defaults to None.

Returns:

predict(X: ndarray, key: str, *, data: Dataset = None)[source]¶

predict the treatment effect or leaf id, which is determined by the parameter key.

Parameters

X (np.ndarray) – features
key (str) – the key can be ‘effect’, ‘leaf_id’ or ‘effect-ND’.
data (Dataset, optional) – dataset. Defaults to None.

Returns

_description_

Return type

_type_

class openasce.inference.tree.gbct.GradientBoostingUpliftTree(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.BinaryCrossEntropy', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]¶

Bases: Boosting

Gradient Boosting debiased Causal Tree for classification, This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.

Reference: 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information Processing Systems 36, 16.

Parameters

learning_rate – float, default=0.1 learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.
n_estimators – int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
subsample – float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.
subfeature – float, default=1.0 The fraction of feature to be used for fitting the individual base learners. Referring to random forest.
train_loss – string, default=openasce.inference.tree.losses.BinaryCrossEntropy. The spliting criterion used by GBCT. User can reimplement by inheriting base class Loss.
min_samples_split – int, default=10 The minimum number of samples required to split an internal node
max_depth – int, default=3 maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.
lambd – float, default=5 The regularization parameter of l2_penalty reference to XGB.
coef – float, default=1 The regularization parameter of selection bias, which refer to GBCT.
n_period (int, optional) – The number of timesteps. It’s required to be provied by user.
treat_dt (int, optional) – The time step that treatment is assigned, which is less than n_period. It’s required to be provied by user.
init – The initial prediction. Default to 0.
random_state – int or RandomState, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details).
verbose – bool, default=False. Enable verbose output.
n_iter_no_change – int, default=10 n_iter_no_change is used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping.
tol – float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops.
nthreads – The number of threads to use for parallelization. Default is 32.
conf (ConfigTree or dict, optional) – _description_. Defaults to None.
bin_mapper (BinMapper, optional) – _description_. Defaults to None.

__annotations__ = {}¶

__doc__ = "\n Gradient Boosting debiased Causal Tree for classification,\n This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.\n\n Reference:\n 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree:\n Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information\n Processing Systems 36, 16.\n\n Arguments:\n learning_rate : float, default=0.1\n learning rate shrinks the contribution of each tree by `learning_rate`.\n There is a trade-off between learning_rate and n_estimators.\n\n\n n_estimators : int, default=100\n The number of boosting stages to perform. Gradient boosting\n is fairly robust to over-fitting so a large number usually\n results in better performance.\n\n subsample : float, default=1.0\n The fraction of samples to be used for fitting the individual base\n learners. If smaller than 1.0 this results in Stochastic Gradient\n Boosting. `subsample` interacts with the parameter `n_estimators`.\n Choosing `subsample < 1.0` leads to a reduction of variance\n and an increase in bias.\n\n subfeature : float, default=1.0\n The fraction of feature to be used for fitting the individual base\n learners. Referring to random forest.\n\n train_loss : string, default=openasce.inference.tree.losses.BinaryCrossEntropy.\n The spliting criterion used by GBCT. User can reimplement by inheriting base class `Loss`.\n\n min_samples_split : int, default=10\n The minimum number of samples required to split an internal node\n\n max_depth : int, default=3\n maximum depth of the individual regression estimators. The maximum\n depth limits the number of nodes in the tree. Tune this parameter\n for best performance; the best value depends on the interaction\n of the input variables.\n\n lambd : float, default=5\n The regularization parameter of l2_penalty reference to XGB.\n\n coef : float, default=1\n The regularization parameter of selection bias, which refer to GBCT.\n\n n_period (int, optional): The number of timesteps. It's required to be provied by user.\n\n treat_dt (int, optional): The time step that treatment is assigned, which is less than n_period. It's\n required to be provied by user.\n\n init : The initial prediction. Default to 0.\n\n random_state : int or RandomState, default=None\n Controls the random seed given to each Tree estimator at each\n boosting iteration.\n In addition, it controls the random permutation of the features at\n each split (see Notes for more details).\n\n verbose : bool, default=False. Enable verbose output.\n\n n_iter_no_change : int, default=10\n ``n_iter_no_change`` is used to decide if early stopping will be used\n to terminate training when validation score is not improving. By\n default it is set to None to disable early stopping.\n\n tol : float, default=1e-4\n Tolerance for the early stopping. When the loss is not improving\n by at least tol for ``n_iter_no_change`` iterations (if set to a\n number), the training stops.\n\n nthreads: The number of threads to use for parallelization. Default is 32.\n\n conf (ConfigTree or dict, optional): _description_. Defaults to None.\n\n bin_mapper (BinMapper, optional): _description_. Defaults to None.\n "¶

__init__(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.BinaryCrossEntropy', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]¶

__module__ = 'openasce.inference.tree.gbct'¶

effect(X: ndarray, *, data: Dataset = None)[source]¶

predict the treatment effect.

Parameters

X – np.ndarray, [n_instances, n_features]. features
data – Dataset, optional. Defaults to None.

Returns

np.ndarray, [n_instances,]. treatment effect.

fit(X: ndarray, Y: ndarray, D: ndarray, *, data: Dataset = None)[source]¶

train the GradientBoostingUpliftTree

Parameters

X – np.ndarray, [n_instances, n_features]. features
Y – np.ndarray, [n_instances, n_period]. outcome
D – np.ndarray, [n_instances]. treatment.
data (Dataset, optional) – _description_. Defaults to None.

Returns:

predict(X: ndarray, key: str, *, data: Dataset = None)[source]¶

predict the treatment effect or leaf id, which is determined by the parameter key.

Parameters

X (np.ndarray) – features
key (str) – the key can be ‘effect’, ‘leaf_id’ or ‘effect-ND’.
data (Dataset, optional) – data. Defaults to None.

Returns

_description_

Return type

_type_

openasce.inference.tree.gradient_causal_tree module¶

class openasce.inference.tree.gradient_causal_tree.GradientDebiasedCausalTree(conf: ConfigTree = None, bin_mapper: BinMapper = None, **kwargs)[source]¶

Bases: object

GradientDebiasedCausalTree is a class that represents a gradient-based debiased causal tree model.

Parameters

conf (ConfigTree) – The configuration tree.
bin_mapper (BinMapper) – The BinMapper instance.
kwargs – Additional keyword arguments.

__dict__ = mappingproxy({'__module__': 'openasce.inference.tree.gradient_causal_tree', '__doc__': '\n GradientDebiasedCausalTree is a class that represents a gradient-based debiased causal tree model.\n\n Arguments:\n conf (ConfigTree): The configuration tree.\n bin_mapper (BinMapper): The BinMapper instance.\n kwargs: Additional keyword arguments.\n\n ', '__init__': <function GradientDebiasedCausalTree.__init__>, 'fit': <function GradientDebiasedCausalTree.fit>, 'updater': <function GradientDebiasedCausalTree.updater>, 'split': <function GradientDebiasedCausalTree.split>, '_split_cpp': <function GradientDebiasedCausalTree._split_cpp>, 'preprocess': <function GradientDebiasedCausalTree.preprocess>, 'export': <function GradientDebiasedCausalTree.export>, 'postprocess': <function GradientDebiasedCausalTree.postprocess>, '_predict': <function GradientDebiasedCausalTree._predict>, 'predict': <function GradientDebiasedCausalTree.predict>, 'gradients': <function GradientDebiasedCausalTree.gradients>, 'loss': <function GradientDebiasedCausalTree.loss>, '__dict__': <attribute '__dict__' of 'GradientDebiasedCausalTree' objects>, '__weakref__': <attribute '__weakref__' of 'GradientDebiasedCausalTree' objects>, '__annotations__': {}})¶

__doc__ = '\n GradientDebiasedCausalTree is a class that represents a gradient-based debiased causal tree model.\n\n Arguments:\n conf (ConfigTree): The configuration tree.\n bin_mapper (BinMapper): The BinMapper instance.\n kwargs: Additional keyword arguments.\n\n '¶

__init__(conf: ConfigTree = None, bin_mapper: BinMapper = None, **kwargs)[source]¶

__module__ = 'openasce.inference.tree.gradient_causal_tree'¶

__weakref__¶: list of weak references to the object (if defined)

_predict(nodes, x, key='effect', out=None)[source]¶

Internal method to predict using the tree nodes.

Parameters

nodes – The tree nodes.
x – The input data.
key – The prediction key.
out – The output array.

Returns

The predicted values.

Return type

ndarray

Raises

NotImplementedError – If the prediction key is not implemented.

_split_cpp(leaves: List[GradientCausalTreeNode], hist: Histogram)[source]¶

Split the tree nodes using C++ implementation.

Parameters

leaves (List[GradientCausalTreeNode]) – The list of tree nodes.
hist (Histogram) – The histogram.

Returns

The split conditions.

Return type

Dict

export()[source]¶

Export the tree model.

Returns: The exported C++ nodes and Python nodes.
Return type: Tuple[List[DidNode], List[Dict]]

fit(gradients, cgradients, data: Dataset, eta=None)[source]¶

Fit the GradientDebiasedCausalTree model.

Parameters

gradients – The gradients.
cgradients – The counterfactal gradients.
data (Dataset) – The training dataset.
eta – The eta values.

Returns

None

gradients(target, prediction, **kwargs)[source]¶

Compute the gradients of the loss function.

Parameters

target – The target values.
prediction – The predicted values.
kwargs – Additional keyword arguments.

Returns

The gradients.

Return type

ndarray

loss(grad, hess, y_hat=None, **kwargs)[source]¶

Compute the loss function.

Parameters

grad – The gradients.
hess – The hessians.
y_hat – The predicted values.
kwargs – Additional keyword arguments.

Returns

The loss values.

Return type

ndarray

postprocess()[source]¶

Perform post-processing steps after fitting the tree.

Returns: None

predict(x, w=None, key='effect', out=None)[source]¶

Predict the treatment effect or other values.

Parameters

x – The input data.
w – The treatment weights.
key – The prediction key.
out – The output array.

Returns

The predicted values.

Return type

ndarray

preprocess(gradients, cgradients, tr_data: Dataset, eta=None, subsample=1, subfeature=1)[source]¶

Preprocess the data before fitting the tree.

Parameters

gradients – The gradients.
cgradients – The cgradients.
tr_data (Dataset) – The training dataset.
eta – The eta values.
subsample – The subsample ratio.
subfeature – The subfeature ratio.

Returns

The histogram and index.

Return type

Tuple[Histogram, ndarray]

split(leaves: List[GradientCausalTreeNode], hist: Histogram)[source]¶

Split the tree nodes.

Parameters

leaves (List[GradientCausalTreeNode]) – The list of tree nodes.
hist (Histogram) – The histogram.

Returns

The split conditions.

Return type

Dict

updater(split_conds: Dict, gradients, cgradients, tr_data, hist: Histogram, idx_map, leaves: List[GradientCausalTreeNode], leaves_range, eta=None)[source]¶

Update the tree nodes.

Parameters

split_conds (Dict) – The split conditions.
gradients – The gradients.
cgradients – The cgradients.
tr_data (Dataset) – The training dataset.
hist (Histogram) – The histogram.
idx_map – The index map.
leaves (List[GradientCausalTreeNode]) – The list of tree nodes.
leaves_range – The range of each leaf.
eta – The eta values.

Returns

The updated tree nodes and updated leaf ranges.

Return type

Tuple[List[GradientCausalTreeNode], ndarray]

class openasce.inference.tree.gradient_causal_tree.GradientDiDCausalTree(conf: ConfigTree = None, bin_mapper: BinMapper = None, **kwargs)[source]¶

Bases: GradientDebiasedCausalTree

GradientDiDCausalTree is a class that represents a gradient-based debiased causal tree model with difference in differences. It inherits from the GradientDebiasedCausalTree class.

Parameters

conf (ConfigTree) – The configuration tree.
bin_mapper (BinMapper) – The BinMapper instance.
kwargs – Additional keyword arguments.

__annotations__ = {}¶

__doc__ = '\n GradientDiDCausalTree is a class that represents a gradient-based debiased causal tree model with difference in\n differences. It inherits from the GradientDebiasedCausalTree class.\n\n Arguments:\n conf (ConfigTree): The configuration tree.\n bin_mapper (BinMapper): The BinMapper instance.\n kwargs: Additional keyword arguments.\n '¶

__module__ = 'openasce.inference.tree.gradient_causal_tree'¶

_predict(nodes, x, key='effect', out=None)[source]¶

Predicting for the given data using the exported nodes.

Parameters

nodes – The exported nodes.
x – The input data.
key – The key specifying the prediction type (default: ‘effect’).
out – The output array (default: None).

Returns

The predictions.

_split_cpp(leaves: List[GradientCausalTreeNode], hist: Histogram)[source]¶

Split the leaf nodes at the current level using C++ implementation.

Parameters

leaves – The list of leaf nodes.
hist – The histogram object.

Returns

The split conditions.

Return type

split_conds

export()[source]¶

Export the GradientDiDCausalTree.

Returns: The exported nodes in C++ object. slim_nodes: The exported nodes in python object.
Return type: slim_cppnodes

predict(x, w=None, key='effect', out=None)[source]¶

Predict the treatment effect or other outcomes for given data.

Parameters

x – The input data.
w – The treatment assignments (default: None).
key – The key specifying the prediction type (default: ‘effect’).
out – The output array (default: None).

Returns

The predicted values. cpred: The counterfactual predicted values. eta: The optimal parallel interval between the treated and control group.

Return type

pred

preprocess(gradients, cgradients, tr_data: Dataset, eta=None, subsample=1, subfeature=1)[source]¶

Preprocesses the data for the GradientDiDCausalTree model.

Parameters

gradients – The gradients.
cgradients – The counterfactual gradients.
tr_data (Dataset) – The training dataset.
eta – The parallel interval between the treated and control group.
subsample (float) – The subsampling ratio for instances (default: 1).
subfeature (float) – The subsampling ratio for features (default: 1).

Returns

The constructed histogram. index (ndarray): The permutation index.

Return type

hist (Histogram)

split(leaves: List[GradientCausalTreeNode], hist: Histogram)[source]¶

Split the leaf nodes.

Parameters

leaves – The list of leaf nodes.
hist – The histogram object.

Returns

The split conditions.

Return type

split_conds

updater(split_conds: Dict, gradients, cgradients, tr_data, hist: Histogram, idx_map, leaves: List[GradientCausalTreeNode], leaves_range, eta)[source]¶

Update the GradientCausalTree by performing splitting and updating histograms.

Parameters

split_conds (Dict) – The split conditions.
gradients – The gradients.
cgradients – The counterfactual gradients.
tr_data – The training dataset.
hist (Histogram) – The histogram object.
idx_map – The index mapping.
leaves (List[GradientCausalTreeNode]) – The list of leaves.
leaves_range – The range of leaves.
eta – The parallel interval between the treated and control group.

Returns

The new leaves. leaves_range_new: The new range of leaves.

Return type

leaves_new (List[GradientCausalTreeNode])

openasce.inference.tree.gradient_causal_tree._filter(leaves, leaves_range)[source]¶

openasce.inference.tree.histogram module¶

class openasce.inference.tree.histogram.Histogram(conf: ConfigTree)[source]¶

Bases: object

__dict__ = mappingproxy({'__module__': 'openasce.inference.tree.histogram', '__init__': <function Histogram.__init__>, 'update_hists': <function Histogram.update_hists>, '__getattr__': <function Histogram.__getattr__>, 'new_instance': <classmethod(<function Histogram.new_instance>)>, '__dict__': <attribute '__dict__' of 'Histogram' objects>, '__weakref__': <attribute '__weakref__' of 'Histogram' objects>, '__doc__': None, '__annotations__': {}})¶

__doc__ = None¶

__getattr__(_Histogram__name: str)[source]¶

Get the attribute value.

Parameters: __name (str) – The name of the attribute.
Returns: The attribute value.
Return type: ndarray
Raises: AttributeError – If the attribute is not found.

__init__(conf: ConfigTree)[source]¶

__module__ = 'openasce.inference.tree.histogram'¶

__weakref__¶: list of weak references to the object (if defined)

classmethod new_instance(dataset: Dataset, conf: ConfigTree = None, **kwargs)[source]¶

Create a new instance of the histogram.

Parameters

dataset (Dataset) – The dataset.
conf (ConfigTree) – The configuration tree.
kwargs – Additional keyword arguments.

Returns

The new instance of the histogram.

Return type

Histogram

update_hists(target, index, leaves_range, treatment, bin_features, is_gradient, is_splitting, threads)[source]¶

Update histograms for all nodes in the same level of a tree

Parameters

target (_type_) – _description_
index (_type_) – _description_
leaves_range (_type_) – _description_
treatment (_type_) – _description_
bin_features (_type_) – _description_
is_gradient (bool) – _description_
is_splitting (bool) – _description_
threads (_type_) – _description_

Raises

ValueError – _description_

Returns

_description_

Return type

_type_

openasce.inference.tree.histogram_test module¶

openasce.inference.tree.information module¶

class openasce.inference.tree.information.CausalDataInfo(conf, **kwargs)[source]¶

Bases: object

__dict__ = mappingproxy({'__module__': 'openasce.inference.tree.information', '__init__': <function CausalDataInfo.__init__>, '__dict__': <attribute '__dict__' of 'CausalDataInfo' objects>, '__weakref__': <attribute '__weakref__' of 'CausalDataInfo' objects>, '__doc__': None, '__annotations__': {}})¶

__doc__ = None¶

__init__(conf, **kwargs)[source]¶

__module__ = 'openasce.inference.tree.information'¶

__weakref__¶: list of weak references to the object (if defined)

openasce.inference.tree.losses module¶

class openasce.inference.tree.losses.BinaryCrossEntropy(**kwargs)[source]¶

Bases: GradLoss

__abstractmethods__ = frozenset({})¶

__doc__ = None¶

__init__(**kwargs)[source]¶

__module__ = 'openasce.inference.tree.losses'¶

_abc_impl = <_abc._abc_data object>¶

property const_hess¶

Check if the hessian is constant.

Returns: True if the hessian is constant, False otherwise.

gradient(target, prediction)[source]¶

Compute the gradient: gradient = prediction - target, where prediction is the positive probability.

Parameters

target – The target values.
prediction – The predicted probabilities.

Returns

The computed gradients.

Return type

ndarray

gradients(target, logit, treatment)[source]¶

Calculate gradient and hessian

Parameters

target (DataFrame) – [description]
prediction (DataFrame) – [description]
treatment (DataFrame) – [description]

Returns

[description]

Return type

Union[Tuple, None]

hessian(target, prediction)[source]¶

Compute the hessian: hessian = prediction * (1 - prediction).

Parameters

target – The target values.
prediction – The predicted probabilities.

Returns

The computed hessians.

Return type

ndarray

loss(target, prediction, logit=True)[source]¶

Calculate the cross entropy

Parameters

target – ground-truth label
prediction – prediction of logits
logit (bool, optional) – [description]. Defaults to True.

class openasce.inference.tree.losses.GradLoss(**kwargs)[source]¶

Bases: Loss

Abstract base class for gradient-based loss functions.

__abstractmethods__ = frozenset({'gradient', 'gradients', 'hessian', 'loss'})¶

__annotations__ = {}¶

__doc__ = 'Abstract base class for gradient-based loss functions.'¶

__module__ = 'openasce.inference.tree.losses'¶

_abc_impl = <_abc._abc_data object>¶

property const_hess¶

Check if the hessian is constant.

Returns: True if the hessian is constant, False otherwise.

abstract gradient(target, prediction)[source]¶

Calculate the gradient of the loss.

Parameters

target – Target values.
prediction – Predicted values.

Raises

NotImplementedError – If the method is not implemented.

Returns

The gradient.

abstract gradients(target, prediction) → Tuple[source]¶

Calculate the gradients and hessians.

Parameters

target – Target values.
prediction – Predicted values.

Raises

NotImplementedError – If the method is not implemented.

Returns

Tuple containing the gradients and hessians.

abstract hessian(target, prediction)[source]¶

Calculate the hessian of the loss.

Parameters

target – Target values.
prediction – Predicted values.

Raises

NotImplementedError – If the method is not implemented.

Returns

The hessian.

class openasce.inference.tree.losses.Loss(**kwargs)[source]¶

Bases: object

Abstract base class for loss functions.

__abstractmethods__ = frozenset({'loss'})¶

__annotations__ = {}¶

__dict__ = mappingproxy({'__module__': 'openasce.inference.tree.losses', '__doc__': 'Abstract base class for loss functions.', '__init__': <function Loss.__init__>, 'new_instance': <staticmethod(<function Loss.new_instance>)>, 'loss': <function Loss.loss>, '__dict__': <attribute '__dict__' of 'Loss' objects>, '__weakref__': <attribute '__weakref__' of 'Loss' objects>, '__abstractmethods__': frozenset({'loss'}), '_abc_impl': <_abc._abc_data object>, '__annotations__': {}})¶

__doc__ = 'Abstract base class for loss functions.'¶

__init__(**kwargs)[source]¶

__module__ = 'openasce.inference.tree.losses'¶

__weakref__¶: list of weak references to the object (if defined)

_abc_impl = <_abc._abc_data object>¶

abstract loss(target, prediction, *args)[source]¶

Calculate the loss.

Parameters

target – Target values.
prediction – Predicted values.
args – Additional arguments.

Raises

NotImplementedError – If the method is not implemented.

Returns

The loss value.

static new_instance(conf)[source]¶

Create a new instance of the loss function.

Parameters: conf – Configuration.
Returns: An instance of the loss function.

class openasce.inference.tree.losses.MeanSquaredError(**kwargs)[source]¶

Bases: GradLoss

__abstractmethods__ = frozenset({})¶

__annotations__ = {}¶

__doc__ = None¶

__init__(**kwargs)[source]¶

__module__ = 'openasce.inference.tree.losses'¶

_abc_impl = <_abc._abc_data object>¶

property const_hess¶

Check if the hessian is constant.

Returns: True if the hessian is constant, False otherwise.

gradient(target, prediction, **kwargs)[source]¶

Calculate the gradient of the loss.

Parameters

target – Target values.
prediction – Predicted values.

Raises

NotImplementedError – If the method is not implemented.

Returns

The gradient.

gradients(target, prediction, **kwargs) → Tuple[source]¶

Calculate the gradients and hessians.

Parameters

target – Target values.
prediction – Predicted values.

Raises

NotImplementedError – If the method is not implemented.

Returns

Tuple containing the gradients and hessians.

hessian(target, prediction, **kwargs)[source]¶

Calculate the hessian of the loss.

Parameters

target – Target values.
prediction – Predicted values.

Raises

NotImplementedError – If the method is not implemented.

Returns

The hessian.

loss(target, prediction, *args, **kwargs)[source]¶

The mean squared loss

Parameters

y – [n_instance, n_outcome]
y_hat – [n_instance, n_outcome] or [n_outcome]

Raises

ValueError – _description_

Returns

_description_

Return type

_type_

openasce.inference.tree.losses.sigmoid(x)[source]¶

openasce.inference.tree.reflect_utils module¶

openasce.inference.tree.reflect_utils.get_class(module_class)[source]¶

Get the class using import_module

Parameters: module_class – module class name, full path a.b.class_name

openasce.inference.tree.reflect_utils.get_class_defined_in_module(module_name, clazz)[source]¶

openasce.inference.tree.reflect_utils.get_object_defined_in_module(module_name, clazz, name=None)[source]¶

openasce.inference.tree.reflect_utils.new_instance(module_class, *args, **kwargs)[source]¶

Create a new instance using import_module

Parameters

module_class – module class name, full path a.b.class_name
args – passed to the constructor of class
kwargs – passed to the constructor of class

openasce.inference.tree.splitting_losses module¶

openasce.inference.tree.splitting_losses.causal_tree_splitting_losses(configs, bin_outcome_hist, bin_counts, parameters: dict)[source]¶

Calculate the splitting losses for the ordinary causal tree.

Parameters

configs – Configuration.
bin_outcome_hist – Histogram of outcome values.
bin_counts – Histogram counts.
parameters – Additional parameters.

Returns

The splitting losses.

openasce.inference.tree.splitting_losses.causal_tree_splitting_losses2(configs, bin_grad_hist, bin_hess_hist, bin_counts, parameters: dict)[source]¶

Calculate the splitting losses for the ordinary causal tree.

Parameters

configs – Configuration.
bin_outcome_hist – Histogram of outcome values.
bin_counts – Histogram counts.
parameters – Additional parameters.

Returns

The splitting losses.

openasce.inference.tree.splitting_losses.didtree_splitting_losses(configs, bin_grad_hist, bin_hess_hist, bin_cgrad_hist, bin_chess_hist, bin_eta_hist, bin_counts, parameters: dict)[source]¶

Calculate the splitting losses for the DiD-Tree model.

Parameters

configs – Configuration.
bin_grad_hist – Histogram of gradients.
bin_hess_hist – Histogram of Hessians.
bin_cgrad_hist – Histogram of cumulative gradients.
bin_chess_hist – Histogram of cumulative Hessians.
bin_eta_hist – Histogram of etas.
bin_counts – Histogram counts.
parameters – Additional parameters.

Returns

The splitting losses.

openasce.inference.tree.splitting_losses.gbct_splitting_losses(configs, bin_grad_hist, bin_hess_hist, bin_cgrad_hist, bin_chess_hist, bin_counts, parameters: dict)[source]¶

Calculate the splitting losses for the GBCT model.

Parameters

configs – Configuration.
bin_grad_hist – Histogram of gradients.
bin_hess_hist – Histogram of Hessians.
bin_cgrad_hist – Histogram of cumulative gradients.
bin_chess_hist – Histogram of cumulative Hessians.
bin_counts – Histogram counts.
parameters – Additional parameters.

Returns

The splitting losses.

openasce.inference.tree.tree_node module¶

class openasce.inference.tree.tree_node.CausalTreeNode(conf: ConfigTree = None, **kwargs)[source]¶

Bases: object

A class for a node in a Causal Tree maximizing heterogenous treatment effect.

__dict__ = mappingproxy({'__module__': 'openasce.inference.tree.tree_node', '__doc__': 'A class for a node in a Causal Tree maximizing heterogenous treatment effect.', '__init__': <function CausalTreeNode.__init__>, 'estimate': <function CausalTreeNode.estimate>, 'estimate_by_hist': <function CausalTreeNode.estimate_by_hist>, 'children': <property object>, 'effect': <function CausalTreeNode.effect>, '__dict__': <attribute '__dict__' of 'CausalTreeNode' objects>, '__weakref__': <attribute '__weakref__' of 'CausalTreeNode' objects>, '__annotations__': {'leaf_id': 'int', 'level_id': 'int', 'op_loss': 'Loss'}})¶

__doc__ = 'A class for a node in a Causal Tree maximizing heterogenous treatment effect.'¶

__init__(conf: ConfigTree = None, **kwargs)[source]¶

__module__ = 'openasce.inference.tree.tree_node'¶

__weakref__¶: list of weak references to the object (if defined)

property children¶

effect(w)[source]¶

Compute the treatment effect.

Parameters: w – The treatment weights.
Returns: The computed treatment effect.
Return type: ndarray

estimate(outcome: ndarray, treatment: ndarray, weight: ndarray = None)[source]¶

Estimate the treatment effect given the outcome, treatment, and weight.

Parameters

outcome – The outcome values.
treatment – The treatment values.
weight – The weight values.

Returns

The estimated treatment effect.

Return type

ndarray

estimate_by_hist(outcome: ndarray, treatment: ndarray, count: ndarray)[source]¶

Estimate the treatment effect using histogram-based method.

Parameters

outcome – The outcome values.
treatment – The treatment values.
count – The count values.

Returns

The estimated treatment effect.

Return type

ndarray

class openasce.inference.tree.tree_node.GradientCausalTreeNode(conf: ConfigTree = None, **kwargs)[source]¶

Bases: CausalTreeNode

A class for a node in a Gradient Boosting Causal Tree.

__annotations__ = {}¶

__doc__ = 'A class for a node in a Gradient Boosting Causal Tree.'¶

__init__(conf: ConfigTree = None, **kwargs)[source]¶

__module__ = 'openasce.inference.tree.tree_node'¶

estimate(G, H, **kwargs)[source]¶

Estimate the treatment effect given the gradients and hessians.

Parameters

G – The gradients.
H – The hessians.
**kwargs – Additional keyword arguments.

Returns

The estimated treatment effect.

Return type

ndarray

openasce.inference.tree.utils module¶

openasce.inference.tree.utils.DEBUG(msg, *args, **kwargs)[source]¶

openasce.inference.tree.utils.ERROR(msg, *args, **kwargs)[source]¶

openasce.inference.tree.utils.FATAL(msg, *args, **kwargs)[source]¶

openasce.inference.tree.utils.INFO(msg, *args, **kwargs)[source]¶

openasce.inference.tree.utils.TRACE(msg, *args, **kwargs)[source]¶

openasce.inference.tree.utils.WARN(msg, *args, **kwargs)[source]¶

openasce.inference.tree.utils._check_c_style_array(*args)[source]¶

openasce.inference.tree.utils._check_match(*args, axis=0)[source]¶

openasce.inference.tree.utils.find_bin_parallel(data, max_bin=64, min_data_in_bin=100, min_split_data=100, pre_filter=False, bin_type=0, use_missing=True, zero_as_missing=False, forced_upper_bounds=[])[source]¶

Find bins in parallel for the given data.

Parameters

data – The input data.
max_bin – The maximum number of bins. (default: 64)
min_data_in_bin – The minimum number of data points in a bin. (default: 100)
min_split_data – The minimum number of data points to split a bin. (default: 100)
pre_filter – Whether to pre-filter the data. (default: False)
bin_type – The type of binning. (default: 0)
use_missing – Whether to use missing values. (default: True)
zero_as_missing – Whether to treat zero as a missing value. (default: False)
forced_upper_bounds – The forced upper bounds for the bins. (default: [])

Returns

The bins found.

Raises

ValueError – If the data type is not supported.

openasce.inference.tree.utils.groupby(data: ndarray, by: ndarray, aggregator: str = 'mean', dropna=True)[source]¶

openasce.inference.tree.utils.indexbyarray(arr, idx, fact_outcome, counterfact_outcome=None, n_threads=- 1)[source]¶

Index an outcome array (arr with shape [n, 2]) by another binary treament array (idx)

Parameters

arr (ndarray) – The input array.
idx (ndarray) – The index array.
fact_outcome (ndarray) – The outcome array to update.
counterfact_outcome (ndarray) – The counterfactual outcome array to update.
n_threads (int, optional) – The number of threads to use. Defaults to -1.

Returns

The updated outcome arrays.

Return type

ndarray

openasce.inference.tree.utils.init_logger()[source]¶

openasce.inference.tree.utils.list_to_array(data: list, out=None, st_idx: int = 0, miss_value=0, threads=- 1)[source]¶

openasce.inference.tree.utils.set_log_level_cpp(level)[source]¶

openasce.inference.tree.utils.t_or_f(arg)[source]¶

openasce.inference.tree.utils.to_row_major(x, dtype=None)[source]¶

openasce.inference.tree.utils.update_histogram(target, x_binned, index, leaves_range, treatment, out, leaves=[], n_treatment=2, n_bins=64, threads=- 1)[source]¶

Update the histogram of each leaf.

Parameters

target (ndarray) – Target array. Shape [n, n_outcome].
x_binned (ndarray) – Binned feature array. Shape [n, n_feature].
index (ndarray) – Index array. Shape [n]. The end position in leaves_range must not exceed n.
leaves_range (ndarray) – List of each leaf’s data range. Shape [n_leaf, 2]. Each term looks like [st_pos, end_pos).
treatment (ndarray) – Treatment array. Shape [n].
out (ndarray) – Output histogram. Shape [n_leaf, n_features, n_bins, n_treatment, n_outcome].
leaves (list, optional) – List of leaf indices. Defaults to [].
n_treatment (int, optional) – The number of treatments. Defaults to 2.
n_bins (int, optional) – The number of bins. Defaults to 64.
threads (int, optional) – The number of threads to use. Defaults to -1.

Returns

The updated histogram array.

Return type

ndarray

openasce.inference.tree.utils.update_histograms(targets, x_binned, index, leaves_range, treatment, outs, leaves=[], n_treatment=2, n_bins=64, threads=- 1)[source]¶

Update the histogram of each leaf.

Parameters

targets (list) – List of target arrays. Shape [n, n_outcome].
x_binned (ndarray) – Binned feature array. Shape [n, n_feature].
index (ndarray) – Index array. Shape [n]. Must satisfy that the end position in leaves_range is not greater than n.
leaves_range (ndarray) – List of each leaf’s data range. Shape [n_leaf, 2]. Each term looks like [st_pos, end_pos).
treatment (ndarray) – Treatment array. Shape [n].
outs (list) – List of output histogram arrays. Shape [n_leaf, n_features, n_bins, n_treatment, n_outcome].
leaves (list, optional) – List of leaf indices. Defaults to [].
n_treatment (int, optional) – The number of treatments. Defaults to 2.
n_bins (int, optional) – The number of bins. Defaults to 64.
threads (int, optional) – The number of threads to use. Defaults to -1.

Returns

The updated histogram arrays.

Return type

ndarray

openasce.inference.tree.utils.update_x_map(x_binned, ins2leaf, split_infos, leaves_range, out, nthread=- 1)[source]¶

Update the index of instances

Parameters

x_binned (ndarray) – The binned feature array.
ins2leaf (ndarray) – The mapping array.
split_infos (ndarray) – The split information array.
leaves_range (ndarray) – The range of each leaf.
out (ndarray) – The output array to store the updated mapping.
nthread (int, optional) – The number of threads to use. Defaults to -1.

Returns

None

openasce.inference.tree.utils.value_bin_parallel(data, bin_mappers: List[openasce.inference.tree.gbct_utils.bin.BinMaper], out=None, threads=- 1)[source]¶

Transform the input data to bin values in parallel.

Parameters

data – The input data.
bin_mappers – The bin mappers.
out – The output array to store the bin values. (default: None)
threads – The number of threads to use (-1 for maximum). (default: -1)

Returns

The transformed bin values.

Raises

ValueError – If the output dtype is not supported.

openasce.inference.tree package¶

Submodules¶

openasce.inference.tree.bin module¶

openasce.inference.tree.bin_test module¶

openasce.inference.tree.boosting module¶

openasce.inference.tree.cppnode module¶

openasce.inference.tree.cppnode_test module¶

openasce.inference.tree.csv_dataset module¶

openasce.inference.tree.dataset module¶

openasce.inference.tree.didtree module¶

openasce.inference.tree.gbct module¶

openasce.inference.tree.gradient_causal_tree module¶

openasce.inference.tree.histogram module¶

openasce.inference.tree.histogram_test module¶

openasce.inference.tree.information module¶

openasce.inference.tree.losses module¶

openasce.inference.tree.reflect_utils module¶

openasce.inference.tree.splitting_losses module¶

openasce.inference.tree.tree_node module¶

openasce.inference.tree.utils module¶

openasce.inference.tree.utils_test module¶