openasce.inference.tree package

class openasce.inference.tree.DifferenceInDifferencesRegressionTree(n_estimators=100, learning_rate=0.1, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, parallel_l2=0, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: ConfigTree = None, bin_mapper: BinMapper = None)[source]

Bases: Boosting

Gradient Boosting debiased Causal Tree for regression, Difference-in-differences meets tree-based methods: heterogeneous treatment effects estimation with unmeasured confounding

Reference: 1. Tang, C., Wang, H., Li, X., Qing, C., Li, L., & Zhou, J. (2023). Difference-in-Differences Meets Tree-Based Methods: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Proceedings of the 40th International Conference on Machine Learning.

Parameters
  • n_estimators – int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.

  • learning_rate – float, default=0.1 learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.

  • subsample – float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.

  • subfeature – float, default=1.0 The fraction of feature to be used for fitting the individual base learners. Referring to random forest.

  • train_loss – string, default=openasce.inference.tree.losses.MeanSquaredError. The spliting criterion used by GBCT. User can reimplement by inheriting base class Loss.

  • min_samples_split – int, default=10 The minimum number of samples required to split an internal node

  • max_depth – int, default=3 maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.

  • lambd – float, default=5 The regularization parameter of l2_penalty reference to XGB.

  • coef – float, default=1 The regularization parameter of selection bias, which refer to GBCT.

  • parallel_l2 – float, default=0

  • n_period (int, optional) – The number of timesteps. It’s required to be provied by user.

  • treat_dt (int, optional) – The time step that treatment is assigned, which is less than n_period. It’s required to be provied by user.

  • init – The initial prediction. Default to 0.

  • random_state – int or RandomState, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details).

  • verbose – bool, default=False. Enable verbose output.

  • n_iter_no_change – int, default=40 n_iter_no_change is used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping.

  • tol – float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops.

  • nthreads – The number of threads to use for parallelization. Default is 32.

  • conf (ConfigTree, optional) – _description_. Defaults to None.

  • bin_mapper (BinMapper, optional) – _description_. Defaults to None.

__doc__ = "\n    Gradient Boosting debiased Causal Tree for regression,\n    Difference-in-differences meets tree-based methods: heterogeneous treatment effects estimation with unmeasured confounding\n\n    Reference:\n    1. Tang, C., Wang, H., Li, X., Qing, C., Li, L., & Zhou, J. (2023). Difference-in-Differences Meets Tree-Based Methods: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Proceedings of the 40th International Conference on Machine Learning.\n\n\n    Arguments:\n        n_estimators : int, default=100\n            The number of boosting stages to perform. Gradient boosting\n            is fairly robust to over-fitting so a large number usually\n            results in better performance.\n\n        learning_rate : float, default=0.1\n            learning rate shrinks the contribution of each tree by `learning_rate`.\n            There is a trade-off between learning_rate and n_estimators.\n\n        subsample : float, default=1.0\n            The fraction of samples to be used for fitting the individual base\n            learners. If smaller than 1.0 this results in Stochastic Gradient\n            Boosting. `subsample` interacts with the parameter `n_estimators`.\n            Choosing `subsample < 1.0` leads to a reduction of variance\n            and an increase in bias.\n\n        subfeature : float, default=1.0\n            The fraction of feature to be used for fitting the individual base\n            learners. Referring to random forest.\n\n        train_loss : string, default=openasce.inference.tree.losses.MeanSquaredError.\n            The spliting criterion used by GBCT. User can reimplement by inheriting base class `Loss`.\n\n        min_samples_split : int, default=10\n            The minimum number of samples required to split an internal node\n\n        max_depth : int, default=3\n            maximum depth of the individual regression estimators. The maximum\n            depth limits the number of nodes in the tree. Tune this parameter\n            for best performance; the best value depends on the interaction\n            of the input variables.\n\n        lambd : float, default=5\n            The regularization parameter of l2_penalty reference to XGB.\n\n        coef : float, default=1\n            The regularization parameter of selection bias, which refer to GBCT.\n\n        parallel_l2: float, default=0\n\n        n_period (int, optional): The number of timesteps. It's required to be provied by user.\n\n        treat_dt (int, optional): The time step that treatment is assigned, which is less than n_period. It's\n            required to be provied by user.\n\n        init : The initial prediction. Default to 0.\n\n        random_state : int or RandomState, default=None\n            Controls the random seed given to each Tree estimator at each\n            boosting iteration.\n            In addition, it controls the random permutation of the features at\n            each split (see Notes for more details).\n\n        verbose : bool, default=False. Enable verbose output.\n\n        n_iter_no_change : int, default=40\n            ``n_iter_no_change`` is used to decide if early stopping will be used\n            to terminate training when validation score is not improving. By\n            default it is set to None to disable early stopping.\n\n        tol : float, default=1e-4\n            Tolerance for the early stopping. When the loss is not improving\n            by at least tol for ``n_iter_no_change`` iterations (if set to a\n            number), the training stops.\n\n        nthreads: The number of threads to use for parallelization. Default is 32.\n\n        conf (ConfigTree, optional): _description_. Defaults to None.\n\n        bin_mapper (BinMapper, optional): _description_. Defaults to None.\n    "
__init__(n_estimators=100, learning_rate=0.1, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, parallel_l2=0, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: ConfigTree = None, bin_mapper: BinMapper = None)[source]
__module__ = 'openasce.inference.tree.didtree'
effect(X: ndarray = None, *, data: Dataset = None)[source]

predict the treatment effect.

Parameters
  • X – np.ndarray, [n_instances, n_features]. features

  • data – Dataset, optional. Defaults to None.

Returns

np.ndarray, [n_instances,]. treatment effect.

fit(X: ndarray = None, Y: ndarray = None, D: ndarray = None, *, data: Dataset = None, data_test: Dataset = None)[source]

train the GradientBoostingUpliftTree

Parameters
  • X – np.ndarray, [n_instances, n_features]. features

  • Y – np.ndarray, [n_instances, n_period]. outcome

  • D – np.ndarray, [n_instances,]. treatment.

  • data (Dataset, optional) – _description_. Defaults to None.

Returns:

predict(X: ndarray = None, key: str = 'effect', *, data: Dataset = None)[source]

predict the treatment effect or leaf id, which is determined by the parameter key.

Parameters
  • X (np.ndarray) – features

  • key (str) – the key can be ‘effect’, ‘leaf_id’ or ‘outcome’.

  • data (Dataset, optional) – dataset. Defaults to None.

Returns

_description_

Return type

_type_

class openasce.inference.tree.GradientBoostingCausalRegressionTree(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]

Bases: Boosting

Gradient Boosting debiased Causal Tree for regression, This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.

Reference: 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information Processing Systems 36, 16.

Parameters
  • learning_rate – float, default=0.1 learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.

  • n_estimators – int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.

  • subsample – float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.

  • subfeature – float, default=1.0 The fraction of feature to be used for fitting the individual base learners. Referring to random forest.

  • train_loss – string, default=openasce.inference.tree.losses.MeanSquaredError. The spliting criterion used by GBCT. User can reimplement by inheriting base class Loss.

  • min_samples_split – int, default=10 The minimum number of samples required to split an internal node

  • max_depth – int, default=3 maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.

  • lambd – float, default=5 The regularization parameter of l2_penalty reference to XGB.

  • coef – float, default=1 The regularization parameter of selection bias, which refer to GBCT.

  • n_period (int, optional) – The number of timesteps. It’s required to be provied by user.

  • treat_dt (int, optional) – The time step that treatment is assigned, which is less than n_period. It’s required to be provied by user.

  • init – The initial prediction. Default to 0.

  • random_state – int or RandomState, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details).

  • verbose – bool, default=False. Enable verbose output.

  • n_iter_no_change – int, default=10 n_iter_no_change is used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping.

  • tol – float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops.

  • nthreads – The number of threads to use for parallelization. Default is 32.

  • conf (ConfigTree or dict, optional) – _description_. Defaults to None.

  • bin_mapper (BinMapper, optional) – _description_. Defaults to None.

__annotations__ = {}
__doc__ = "\n    Gradient Boosting debiased Causal Tree for regression,\n    This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.\n\n    Reference:\n    1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree:\n    Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information\n    Processing Systems 36, 16.\n\n    Arguments:\n        learning_rate : float, default=0.1\n            learning rate shrinks the contribution of each tree by `learning_rate`.\n            There is a trade-off between learning_rate and n_estimators.\n\n\n        n_estimators : int, default=100\n            The number of boosting stages to perform. Gradient boosting\n            is fairly robust to over-fitting so a large number usually\n            results in better performance.\n\n        subsample : float, default=1.0\n            The fraction of samples to be used for fitting the individual base\n            learners. If smaller than 1.0 this results in Stochastic Gradient\n            Boosting. `subsample` interacts with the parameter `n_estimators`.\n            Choosing `subsample < 1.0` leads to a reduction of variance\n            and an increase in bias.\n\n        subfeature : float, default=1.0\n            The fraction of feature to be used for fitting the individual base\n            learners. Referring to random forest.\n\n        train_loss : string, default=openasce.inference.tree.losses.MeanSquaredError.\n            The spliting criterion used by GBCT. User can reimplement by inheriting base class `Loss`.\n\n        min_samples_split : int, default=10\n            The minimum number of samples required to split an internal node\n\n        max_depth : int, default=3\n            maximum depth of the individual regression estimators. The maximum\n            depth limits the number of nodes in the tree. Tune this parameter\n            for best performance; the best value depends on the interaction\n            of the input variables.\n\n        lambd : float, default=5\n            The regularization parameter of l2_penalty reference to XGB.\n\n        coef : float, default=1\n            The regularization parameter of selection bias, which refer to GBCT.\n\n        n_period (int, optional): The number of timesteps. It's required to be provied by user.\n\n        treat_dt (int, optional): The time step that treatment is assigned, which is less than n_period. It's\n            required to be provied by user.\n\n        init : The initial prediction. Default to 0.\n\n        random_state : int or RandomState, default=None\n            Controls the random seed given to each Tree estimator at each\n            boosting iteration.\n            In addition, it controls the random permutation of the features at\n            each split (see Notes for more details).\n\n        verbose : bool, default=False. Enable verbose output.\n\n        n_iter_no_change : int, default=10\n            ``n_iter_no_change`` is used to decide if early stopping will be used\n            to terminate training when validation score is not improving. By\n            default it is set to None to disable early stopping.\n\n        tol : float, default=1e-4\n            Tolerance for the early stopping. When the loss is not improving\n            by at least tol for ``n_iter_no_change`` iterations (if set to a\n            number), the training stops.\n\n        nthreads: The number of threads to use for parallelization. Default is 32.\n\n        conf (ConfigTree or dict, optional): _description_. Defaults to None.\n\n        bin_mapper (BinMapper, optional): _description_. Defaults to None.\n    "
__init__(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]
__module__ = 'openasce.inference.tree.gbct'
effect(X: ndarray, *, data: Dataset = None)[source]

predict the treatment effect.

Parameters
  • X – np.ndarray, [n_instances, n_features]. features

  • data – Dataset, optional. Defaults to None.

Returns

np.ndarray, [n_instances,]. treatment effect.

fit(X: ndarray, Y: ndarray, D: ndarray, *, data: Dataset = None)[source]

train the GradientBoostingUpliftTree

Parameters
  • X – np.ndarray, [n_instances, n_features]. features

  • Y – np.ndarray, [n_instances, n_period]. outcome

  • D – np.ndarray, [n_instances,]. treatment.

  • data (Dataset, optional) – _description_. Defaults to None.

Returns:

predict(X: ndarray, key: str, *, data: Dataset = None)[source]

predict the treatment effect or leaf id, which is determined by the parameter key.

Parameters
  • X (np.ndarray) – features

  • key (str) – the key can be ‘effect’, ‘leaf_id’ or ‘effect-ND’.

  • data (Dataset, optional) – dataset. Defaults to None.

Returns

_description_

Return type

_type_

class openasce.inference.tree.GradientBoostingUpliftTree(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.BinaryCrossEntropy', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]

Bases: Boosting

Gradient Boosting debiased Causal Tree for classification, This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.

Reference: 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information Processing Systems 36, 16.

Parameters
  • learning_rate – float, default=0.1 learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.

  • n_estimators – int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.

  • subsample – float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.

  • subfeature – float, default=1.0 The fraction of feature to be used for fitting the individual base learners. Referring to random forest.

  • train_loss – string, default=openasce.inference.tree.losses.BinaryCrossEntropy. The spliting criterion used by GBCT. User can reimplement by inheriting base class Loss.

  • min_samples_split – int, default=10 The minimum number of samples required to split an internal node

  • max_depth – int, default=3 maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.

  • lambd – float, default=5 The regularization parameter of l2_penalty reference to XGB.

  • coef – float, default=1 The regularization parameter of selection bias, which refer to GBCT.

  • n_period (int, optional) – The number of timesteps. It’s required to be provied by user.

  • treat_dt (int, optional) – The time step that treatment is assigned, which is less than n_period. It’s required to be provied by user.

  • init – The initial prediction. Default to 0.

  • random_state – int or RandomState, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details).

  • verbose – bool, default=False. Enable verbose output.

  • n_iter_no_change – int, default=10 n_iter_no_change is used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping.

  • tol – float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops.

  • nthreads – The number of threads to use for parallelization. Default is 32.

  • conf (ConfigTree or dict, optional) – _description_. Defaults to None.

  • bin_mapper (BinMapper, optional) – _description_. Defaults to None.

__annotations__ = {}
__doc__ = "\n    Gradient Boosting debiased Causal Tree for classification,\n    This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.\n\n    Reference:\n    1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree:\n    Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information\n    Processing Systems 36, 16.\n\n    Arguments:\n        learning_rate : float, default=0.1\n            learning rate shrinks the contribution of each tree by `learning_rate`.\n            There is a trade-off between learning_rate and n_estimators.\n\n\n        n_estimators : int, default=100\n            The number of boosting stages to perform. Gradient boosting\n            is fairly robust to over-fitting so a large number usually\n            results in better performance.\n\n        subsample : float, default=1.0\n            The fraction of samples to be used for fitting the individual base\n            learners. If smaller than 1.0 this results in Stochastic Gradient\n            Boosting. `subsample` interacts with the parameter `n_estimators`.\n            Choosing `subsample < 1.0` leads to a reduction of variance\n            and an increase in bias.\n\n        subfeature : float, default=1.0\n            The fraction of feature to be used for fitting the individual base\n            learners. Referring to random forest.\n\n        train_loss : string, default=openasce.inference.tree.losses.BinaryCrossEntropy.\n            The spliting criterion used by GBCT. User can reimplement by inheriting base class `Loss`.\n\n        min_samples_split : int, default=10\n            The minimum number of samples required to split an internal node\n\n        max_depth : int, default=3\n            maximum depth of the individual regression estimators. The maximum\n            depth limits the number of nodes in the tree. Tune this parameter\n            for best performance; the best value depends on the interaction\n            of the input variables.\n\n        lambd : float, default=5\n            The regularization parameter of l2_penalty reference to XGB.\n\n        coef : float, default=1\n            The regularization parameter of selection bias, which refer to GBCT.\n\n        n_period (int, optional): The number of timesteps. It's required to be provied by user.\n\n        treat_dt (int, optional): The time step that treatment is assigned, which is less than n_period. It's\n            required to be provied by user.\n\n        init : The initial prediction. Default to 0.\n\n        random_state : int or RandomState, default=None\n            Controls the random seed given to each Tree estimator at each\n            boosting iteration.\n            In addition, it controls the random permutation of the features at\n            each split (see Notes for more details).\n\n        verbose : bool, default=False. Enable verbose output.\n\n        n_iter_no_change : int, default=10\n            ``n_iter_no_change`` is used to decide if early stopping will be used\n            to terminate training when validation score is not improving. By\n            default it is set to None to disable early stopping.\n\n        tol : float, default=1e-4\n            Tolerance for the early stopping. When the loss is not improving\n            by at least tol for ``n_iter_no_change`` iterations (if set to a\n            number), the training stops.\n\n        nthreads: The number of threads to use for parallelization. Default is 32.\n\n        conf (ConfigTree or dict, optional): _description_. Defaults to None.\n\n        bin_mapper (BinMapper, optional): _description_. Defaults to None.\n    "
__init__(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.BinaryCrossEntropy', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]
__module__ = 'openasce.inference.tree.gbct'
effect(X: ndarray, *, data: Dataset = None)[source]

predict the treatment effect.

Parameters
  • X – np.ndarray, [n_instances, n_features]. features

  • data – Dataset, optional. Defaults to None.

Returns

np.ndarray, [n_instances,]. treatment effect.

fit(X: ndarray, Y: ndarray, D: ndarray, *, data: Dataset = None)[source]

train the GradientBoostingUpliftTree

Parameters
  • X – np.ndarray, [n_instances, n_features]. features

  • Y – np.ndarray, [n_instances, n_period]. outcome

  • D – np.ndarray, [n_instances]. treatment.

  • data (Dataset, optional) – _description_. Defaults to None.

Returns:

predict(X: ndarray, key: str, *, data: Dataset = None)[source]

predict the treatment effect or leaf id, which is determined by the parameter key.

Parameters
  • X (np.ndarray) – features

  • key (str) – the key can be ‘effect’, ‘leaf_id’ or ‘effect-ND’.

  • data (Dataset, optional) – data. Defaults to None.

Returns

_description_

Return type

_type_

Submodules

openasce.inference.tree.bin module

class openasce.inference.tree.bin.BinMapper(conf: ConfigTree)[source]

Bases: KBinsDiscretizer

A class for binning numerical features.

__annotations__ = {}
__doc__ = 'A class for binning numerical features.'
__init__(conf: ConfigTree)[source]
__module__ = 'openasce.inference.tree.bin'
_sklearn_auto_wrap_output_keys = {'transform'}
description()[source]

Print the description of the bin mapper.

fit(X, y=None)[source]

Fit the bin mapper on the input features.

Parameters
  • X – Input features.

  • y – The target variable (not used).

Returns

The fitted bin mapper object.

fit_dataset(data)[source]

Fit the bin mapper on the dataset.

Parameters

data – Dataset object containing the input features.

fit_transform(X, y=None, **fit_params)[source]

Fit the bin mapper on the input features and transform them.

Parameters
  • X – Input features.

  • y – The target variable (not used).

  • fit_params – Additional parameters for fitting.

Returns

The transformed features.

inverse_transform(Xt, index: int = None)[source]

Inverse transform the transformed features to the original values.

Parameters
  • Xt – Transformed features.

  • index – Index of the feature to inverse transform.

Returns

The inverse transformed features.

property is_fit

Check if the bin mapper is fit.

Returns

True if the bin mapper is fit, False otherwise.

transform(X)[source]

Transform the input features using the bin mapper.

Parameters

X – Input features.

Returns

The transformed features.

property upper_bounds

Get the upper bounds of the bins.

Returns

The upper bounds of the bins.

openasce.inference.tree.bin_test module

openasce.inference.tree.boosting module

class openasce.inference.tree.boosting.Boosting(tree_cls, conf: ConfigTree, bin_mapper: BinMapper = None)[source]

Bases: object

__annotations__ = {}
__dict__ = mappingproxy({'__module__': 'openasce.inference.tree.boosting', '__init__': <function Boosting.__init__>, 'fit': <function Boosting.fit>, 'preprocess': <function Boosting.preprocess>, 'check_data': <function Boosting.check_data>, 'tr_val': <function Boosting.tr_val>, 'postprocess': <function Boosting.postprocess>, '_validation': <function Boosting._validation>, '_update_paramers': <function Boosting._update_paramers>, 'early_stopping': <function Boosting.early_stopping>, 'predict': <function Boosting.predict>, 'effect': <function Boosting.effect>, 'split_counts': <function Boosting.split_counts>, '__dict__': <attribute '__dict__' of 'Boosting' objects>, '__weakref__': <attribute '__weakref__' of 'Boosting' objects>, '__doc__': None, '__annotations__': {}})
__doc__ = None
__init__(tree_cls, conf: ConfigTree, bin_mapper: BinMapper = None)[source]
__module__ = 'openasce.inference.tree.boosting'
__weakref__

list of weak references to the object (if defined)

_update_paramers(*args, **kwargs)[source]
_validation(target, prediction, cprediction=None)[source]
check_data(data: Dataset)[source]
early_stopping() bool[source]

Check if early stopping criteria is met based on the validation losses.

Returns

True if early stopping criteria is met, False otherwise.

effect(X=None, *, data: Dataset = None)[source]

Predict the treatment effect on the input data.

fit(data: Dataset)[source]

Fit the causal forest model on the provided dataset.

Parameters

data – Dataset object containing the input features, targets, and treatment.

postprocess()[source]
predict(X, key: str, *, data: Dataset = None)[source]

Predict the output using the trained model on the input data.

Parameters
  • X – Feature matrix of the input data.

  • key – Type of prediction, can be ‘leaf_id’, ‘effect’, or ‘effect-ND’.

  • data – Dataset object containing the feature data.

Returns

Prediction result based on the specified key.

Raises

RuntimeError – If the specified key is unknown and not supported.

preprocess(data: Dataset)[source]

Perform preprocessing steps on the provided dataset.

Parameters

data – Dataset object containing the input features, targets, and treatment.

split_counts(trees=None, feature_names=None)[source]

Count the number of splits made on each feature in the gradient boost causal trees.

Parameters
  • trees – List of decision trees. If None, uses the trained trees.

  • feature_names – List of feature names. If None, uses the feature columns from conf.

Returns

Dictionary with feature names as keys and the corresponding split counts as values.

tr_val(data: Dataset, subsample=None, **kwargs)[source]

Split the dataset into training and validation sets.

Parameters
  • data – Dataset object containing the input features, targets, and treatment.

  • subsample – Ratio of instances to include in the training set. If None, uses the instance_ratio from conf.

Returns

Tuple containing the training dataset, validation dataset, indices of training instances, and indices of validation instances.

openasce.inference.tree.cppnode module

openasce.inference.tree.cppnode.create_didnode_from_dict(info)[source]

Create a CppDebiasNode from a dictionary.

Parameters

info (Dict) – The node information.

Returns

The CppDebiasNode instance.

Return type

CppDebiasNode

openasce.inference.tree.cppnode.predict(nodes: List[openasce.inference.tree.gbct_utils.common.CppDiDNode], x, out, key, threads=20)[source]

Predict using the tree nodes.

Parameters
  • nodes (List) – The list of tree nodes.

  • x (ndarray) – The input data.

  • out (ndarray) – The output array.

  • key (ndarray) – The prediction key.

  • threads (int) – The number of threads.

Returns

The predicted values.

Return type

ndarray

Raises
  • RuntimeError – If the number of nodes is less than or equal to 0.

  • ValueError – If the node type is not supported.

openasce.inference.tree.cppnode_test module

openasce.inference.tree.csv_dataset module

class openasce.inference.tree.csv_dataset.CsvDataset(conf=None, **kwargs)[source]

Bases: Dataset

A Dataset interface for loading csv data

__doc__ = 'A Dataset interface for loading csv data'
__init__(conf=None, **kwargs)[source]
__module__ = 'openasce.inference.tree.csv_dataset'
property features
static new_instance(conf)[source]
read(filename=None)[source]
sub_dataset(index=None, cols=None, cols_y=[]) Dataset[source]

Abstract interface of sub-sampling

Parameters

index (_type_, optional) – _description_. Defaults to None.

Raises

NotImplementedError – _description_

property targets
property treatment
property weight

openasce.inference.tree.dataset module

class openasce.inference.tree.dataset.Dataset[source]

Bases: object

Abstract interface of class dataset

__annotations__ = {}
__dict__ = mappingproxy({'__module__': 'openasce.inference.tree.dataset', '__doc__': 'Abstract interface of class dataset', '__init__': <function Dataset.__init__>, '__len__': <function Dataset.__len__>, 'new_instance': <staticmethod(<function Dataset.new_instance>)>, 'read': <function Dataset.read>, 'sub_dataset': <function Dataset.sub_dataset>, 'description': <function Dataset.description>, 'targets': <property object>, 'features': <property object>, 'treatment': <property object>, 'feature_columns': <property object>, '__dict__': <attribute '__dict__' of 'Dataset' objects>, '__weakref__': <attribute '__weakref__' of 'Dataset' objects>, '__annotations__': {}})
__doc__ = 'Abstract interface of class dataset'
__init__()[source]
__len__()[source]
__module__ = 'openasce.inference.tree.dataset'
__weakref__

list of weak references to the object (if defined)

description(detail: bool = False) None[source]

description the dataset

Parameters

detail (bool, optional) – [description]. Defaults to False.

property feature_columns
property features
static new_instance(conf)[source]
read(filename)[source]
sub_dataset(index=None)[source]

Abstract interface of sub-sampling

Parameters

index (_type_, optional) – _description_. Defaults to None.

Raises

NotImplementedError – _description_

property targets
property treatment
class openasce.inference.tree.dataset.PsudoDataset(features: ndarray = None, outcome: ndarray = None, treatment: ndarray = None, conf=None)[source]

Bases: Dataset

A Psudo Dataset to wrap for the numpy formatting data.

Parameters
  • features (np.ndarray, optional) – features. Defaults to None.

  • outcome (np.ndarray, optional) – outcome. Defaults to None.

  • treatment (np.ndarray, optional) – treatment. Defaults to None.

  • conf (_type_, optional) – configure. Defaults to None.

__annotations__ = {}
__doc__ = '\n    A Psudo Dataset to wrap for the numpy formatting data.\n\n    Arguments:\n        features (np.ndarray, optional): features. Defaults to None.\n        outcome (np.ndarray, optional): outcome. Defaults to None.\n        treatment (np.ndarray, optional): treatment. Defaults to None.\n        conf (_type_, optional): configure. Defaults to None.\n    '
__init__(features: ndarray = None, outcome: ndarray = None, treatment: ndarray = None, conf=None)[source]
__module__ = 'openasce.inference.tree.dataset'
property feature_columns
property features
sub_dataset(index=None, cols=None) Dataset[source]

Create a sub-dataset.

Parameters
  • index – Indices of the samples to include in the sub-dataset.

  • cols – Columns to include in the sub-dataset.

Returns

The sub-dataset.

property targets
property treatment
property weight

openasce.inference.tree.didtree module

class openasce.inference.tree.didtree.DifferenceInDifferencesRegressionTree(n_estimators=100, learning_rate=0.1, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, parallel_l2=0, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: ConfigTree = None, bin_mapper: BinMapper = None)[source]

Bases: Boosting

Gradient Boosting debiased Causal Tree for regression, Difference-in-differences meets tree-based methods: heterogeneous treatment effects estimation with unmeasured confounding

Reference: 1. Tang, C., Wang, H., Li, X., Qing, C., Li, L., & Zhou, J. (2023). Difference-in-Differences Meets Tree-Based Methods: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Proceedings of the 40th International Conference on Machine Learning.

Parameters
  • n_estimators – int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.

  • learning_rate – float, default=0.1 learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.

  • subsample – float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.

  • subfeature – float, default=1.0 The fraction of feature to be used for fitting the individual base learners. Referring to random forest.

  • train_loss – string, default=openasce.inference.tree.losses.MeanSquaredError. The spliting criterion used by GBCT. User can reimplement by inheriting base class Loss.

  • min_samples_split – int, default=10 The minimum number of samples required to split an internal node

  • max_depth – int, default=3 maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.

  • lambd – float, default=5 The regularization parameter of l2_penalty reference to XGB.

  • coef – float, default=1 The regularization parameter of selection bias, which refer to GBCT.

  • parallel_l2 – float, default=0

  • n_period (int, optional) – The number of timesteps. It’s required to be provied by user.

  • treat_dt (int, optional) – The time step that treatment is assigned, which is less than n_period. It’s required to be provied by user.

  • init – The initial prediction. Default to 0.

  • random_state – int or RandomState, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details).

  • verbose – bool, default=False. Enable verbose output.

  • n_iter_no_change – int, default=40 n_iter_no_change is used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping.

  • tol – float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops.

  • nthreads – The number of threads to use for parallelization. Default is 32.

  • conf (ConfigTree, optional) – _description_. Defaults to None.

  • bin_mapper (BinMapper, optional) – _description_. Defaults to None.

__annotations__ = {}
__doc__ = "\n    Gradient Boosting debiased Causal Tree for regression,\n    Difference-in-differences meets tree-based methods: heterogeneous treatment effects estimation with unmeasured confounding\n\n    Reference:\n    1. Tang, C., Wang, H., Li, X., Qing, C., Li, L., & Zhou, J. (2023). Difference-in-Differences Meets Tree-Based Methods: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Proceedings of the 40th International Conference on Machine Learning.\n\n\n    Arguments:\n        n_estimators : int, default=100\n            The number of boosting stages to perform. Gradient boosting\n            is fairly robust to over-fitting so a large number usually\n            results in better performance.\n\n        learning_rate : float, default=0.1\n            learning rate shrinks the contribution of each tree by `learning_rate`.\n            There is a trade-off between learning_rate and n_estimators.\n\n        subsample : float, default=1.0\n            The fraction of samples to be used for fitting the individual base\n            learners. If smaller than 1.0 this results in Stochastic Gradient\n            Boosting. `subsample` interacts with the parameter `n_estimators`.\n            Choosing `subsample < 1.0` leads to a reduction of variance\n            and an increase in bias.\n\n        subfeature : float, default=1.0\n            The fraction of feature to be used for fitting the individual base\n            learners. Referring to random forest.\n\n        train_loss : string, default=openasce.inference.tree.losses.MeanSquaredError.\n            The spliting criterion used by GBCT. User can reimplement by inheriting base class `Loss`.\n\n        min_samples_split : int, default=10\n            The minimum number of samples required to split an internal node\n\n        max_depth : int, default=3\n            maximum depth of the individual regression estimators. The maximum\n            depth limits the number of nodes in the tree. Tune this parameter\n            for best performance; the best value depends on the interaction\n            of the input variables.\n\n        lambd : float, default=5\n            The regularization parameter of l2_penalty reference to XGB.\n\n        coef : float, default=1\n            The regularization parameter of selection bias, which refer to GBCT.\n\n        parallel_l2: float, default=0\n\n        n_period (int, optional): The number of timesteps. It's required to be provied by user.\n\n        treat_dt (int, optional): The time step that treatment is assigned, which is less than n_period. It's\n            required to be provied by user.\n\n        init : The initial prediction. Default to 0.\n\n        random_state : int or RandomState, default=None\n            Controls the random seed given to each Tree estimator at each\n            boosting iteration.\n            In addition, it controls the random permutation of the features at\n            each split (see Notes for more details).\n\n        verbose : bool, default=False. Enable verbose output.\n\n        n_iter_no_change : int, default=40\n            ``n_iter_no_change`` is used to decide if early stopping will be used\n            to terminate training when validation score is not improving. By\n            default it is set to None to disable early stopping.\n\n        tol : float, default=1e-4\n            Tolerance for the early stopping. When the loss is not improving\n            by at least tol for ``n_iter_no_change`` iterations (if set to a\n            number), the training stops.\n\n        nthreads: The number of threads to use for parallelization. Default is 32.\n\n        conf (ConfigTree, optional): _description_. Defaults to None.\n\n        bin_mapper (BinMapper, optional): _description_. Defaults to None.\n    "
__init__(n_estimators=100, learning_rate=0.1, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, parallel_l2=0, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: ConfigTree = None, bin_mapper: BinMapper = None)[source]
__module__ = 'openasce.inference.tree.didtree'
effect(X: ndarray = None, *, data: Dataset = None)[source]

predict the treatment effect.

Parameters
  • X – np.ndarray, [n_instances, n_features]. features

  • data – Dataset, optional. Defaults to None.

Returns

np.ndarray, [n_instances,]. treatment effect.

fit(X: ndarray = None, Y: ndarray = None, D: ndarray = None, *, data: Dataset = None, data_test: Dataset = None)[source]

train the GradientBoostingUpliftTree

Parameters
  • X – np.ndarray, [n_instances, n_features]. features

  • Y – np.ndarray, [n_instances, n_period]. outcome

  • D – np.ndarray, [n_instances,]. treatment.

  • data (Dataset, optional) – _description_. Defaults to None.

Returns:

predict(X: ndarray = None, key: str = 'effect', *, data: Dataset = None)[source]

predict the treatment effect or leaf id, which is determined by the parameter key.

Parameters
  • X (np.ndarray) – features

  • key (str) – the key can be ‘effect’, ‘leaf_id’ or ‘outcome’.

  • data (Dataset, optional) – dataset. Defaults to None.

Returns

_description_

Return type

_type_

openasce.inference.tree.gbct module

class openasce.inference.tree.gbct.GradientBoostingCausalRegressionTree(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]

Bases: Boosting

Gradient Boosting debiased Causal Tree for regression, This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.

Reference: 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information Processing Systems 36, 16.

Parameters
  • learning_rate – float, default=0.1 learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.

  • n_estimators – int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.

  • subsample – float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.

  • subfeature – float, default=1.0 The fraction of feature to be used for fitting the individual base learners. Referring to random forest.

  • train_loss – string, default=openasce.inference.tree.losses.MeanSquaredError. The spliting criterion used by GBCT. User can reimplement by inheriting base class Loss.

  • min_samples_split – int, default=10 The minimum number of samples required to split an internal node

  • max_depth – int, default=3 maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.

  • lambd – float, default=5 The regularization parameter of l2_penalty reference to XGB.

  • coef – float, default=1 The regularization parameter of selection bias, which refer to GBCT.

  • n_period (int, optional) – The number of timesteps. It’s required to be provied by user.

  • treat_dt (int, optional) – The time step that treatment is assigned, which is less than n_period. It’s required to be provied by user.

  • init – The initial prediction. Default to 0.

  • random_state – int or RandomState, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details).

  • verbose – bool, default=False. Enable verbose output.

  • n_iter_no_change – int, default=10 n_iter_no_change is used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping.

  • tol – float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops.

  • nthreads – The number of threads to use for parallelization. Default is 32.

  • conf (ConfigTree or dict, optional) – _description_. Defaults to None.

  • bin_mapper (BinMapper, optional) – _description_. Defaults to None.

__annotations__ = {}
__doc__ = "\n    Gradient Boosting debiased Causal Tree for regression,\n    This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.\n\n    Reference:\n    1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree:\n    Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information\n    Processing Systems 36, 16.\n\n    Arguments:\n        learning_rate : float, default=0.1\n            learning rate shrinks the contribution of each tree by `learning_rate`.\n            There is a trade-off between learning_rate and n_estimators.\n\n\n        n_estimators : int, default=100\n            The number of boosting stages to perform. Gradient boosting\n            is fairly robust to over-fitting so a large number usually\n            results in better performance.\n\n        subsample : float, default=1.0\n            The fraction of samples to be used for fitting the individual base\n            learners. If smaller than 1.0 this results in Stochastic Gradient\n            Boosting. `subsample` interacts with the parameter `n_estimators`.\n            Choosing `subsample < 1.0` leads to a reduction of variance\n            and an increase in bias.\n\n        subfeature : float, default=1.0\n            The fraction of feature to be used for fitting the individual base\n            learners. Referring to random forest.\n\n        train_loss : string, default=openasce.inference.tree.losses.MeanSquaredError.\n            The spliting criterion used by GBCT. User can reimplement by inheriting base class `Loss`.\n\n        min_samples_split : int, default=10\n            The minimum number of samples required to split an internal node\n\n        max_depth : int, default=3\n            maximum depth of the individual regression estimators. The maximum\n            depth limits the number of nodes in the tree. Tune this parameter\n            for best performance; the best value depends on the interaction\n            of the input variables.\n\n        lambd : float, default=5\n            The regularization parameter of l2_penalty reference to XGB.\n\n        coef : float, default=1\n            The regularization parameter of selection bias, which refer to GBCT.\n\n        n_period (int, optional): The number of timesteps. It's required to be provied by user.\n\n        treat_dt (int, optional): The time step that treatment is assigned, which is less than n_period. It's\n            required to be provied by user.\n\n        init : The initial prediction. Default to 0.\n\n        random_state : int or RandomState, default=None\n            Controls the random seed given to each Tree estimator at each\n            boosting iteration.\n            In addition, it controls the random permutation of the features at\n            each split (see Notes for more details).\n\n        verbose : bool, default=False. Enable verbose output.\n\n        n_iter_no_change : int, default=10\n            ``n_iter_no_change`` is used to decide if early stopping will be used\n            to terminate training when validation score is not improving. By\n            default it is set to None to disable early stopping.\n\n        tol : float, default=1e-4\n            Tolerance for the early stopping. When the loss is not improving\n            by at least tol for ``n_iter_no_change`` iterations (if set to a\n            number), the training stops.\n\n        nthreads: The number of threads to use for parallelization. Default is 32.\n\n        conf (ConfigTree or dict, optional): _description_. Defaults to None.\n\n        bin_mapper (BinMapper, optional): _description_. Defaults to None.\n    "
__init__(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]
__module__ = 'openasce.inference.tree.gbct'
effect(X: ndarray, *, data: Dataset = None)[source]

predict the treatment effect.

Parameters
  • X – np.ndarray, [n_instances, n_features]. features

  • data – Dataset, optional. Defaults to None.

Returns

np.ndarray, [n_instances,]. treatment effect.

fit(X: ndarray, Y: ndarray, D: ndarray, *, data: Dataset = None)[source]

train the GradientBoostingUpliftTree

Parameters
  • X – np.ndarray, [n_instances, n_features]. features

  • Y – np.ndarray, [n_instances, n_period]. outcome

  • D – np.ndarray, [n_instances,]. treatment.

  • data (Dataset, optional) – _description_. Defaults to None.

Returns:

predict(X: ndarray, key: str, *, data: Dataset = None)[source]

predict the treatment effect or leaf id, which is determined by the parameter key.

Parameters
  • X (np.ndarray) – features

  • key (str) – the key can be ‘effect’, ‘leaf_id’ or ‘effect-ND’.

  • data (Dataset, optional) – dataset. Defaults to None.

Returns

_description_

Return type

_type_

class openasce.inference.tree.gbct.GradientBoostingUpliftTree(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.BinaryCrossEntropy', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]

Bases: Boosting

Gradient Boosting debiased Causal Tree for classification, This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.

Reference: 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information Processing Systems 36, 16.

Parameters
  • learning_rate – float, default=0.1 learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.

  • n_estimators – int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.

  • subsample – float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.

  • subfeature – float, default=1.0 The fraction of feature to be used for fitting the individual base learners. Referring to random forest.

  • train_loss – string, default=openasce.inference.tree.losses.BinaryCrossEntropy. The spliting criterion used by GBCT. User can reimplement by inheriting base class Loss.

  • min_samples_split – int, default=10 The minimum number of samples required to split an internal node

  • max_depth – int, default=3 maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.

  • lambd – float, default=5 The regularization parameter of l2_penalty reference to XGB.

  • coef – float, default=1 The regularization parameter of selection bias, which refer to GBCT.

  • n_period (int, optional) – The number of timesteps. It’s required to be provied by user.

  • treat_dt (int, optional) – The time step that treatment is assigned, which is less than n_period. It’s required to be provied by user.

  • init – The initial prediction. Default to 0.

  • random_state – int or RandomState, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details).

  • verbose – bool, default=False. Enable verbose output.

  • n_iter_no_change – int, default=10 n_iter_no_change is used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping.

  • tol – float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops.

  • nthreads – The number of threads to use for parallelization. Default is 32.

  • conf (ConfigTree or dict, optional) – _description_. Defaults to None.

  • bin_mapper (BinMapper, optional) – _description_. Defaults to None.

__annotations__ = {}
__doc__ = "\n    Gradient Boosting debiased Causal Tree for classification,\n    This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.\n\n    Reference:\n    1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree:\n    Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information\n    Processing Systems 36, 16.\n\n    Arguments:\n        learning_rate : float, default=0.1\n            learning rate shrinks the contribution of each tree by `learning_rate`.\n            There is a trade-off between learning_rate and n_estimators.\n\n\n        n_estimators : int, default=100\n            The number of boosting stages to perform. Gradient boosting\n            is fairly robust to over-fitting so a large number usually\n            results in better performance.\n\n        subsample : float, default=1.0\n            The fraction of samples to be used for fitting the individual base\n            learners. If smaller than 1.0 this results in Stochastic Gradient\n            Boosting. `subsample` interacts with the parameter `n_estimators`.\n            Choosing `subsample < 1.0` leads to a reduction of variance\n            and an increase in bias.\n\n        subfeature : float, default=1.0\n            The fraction of feature to be used for fitting the individual base\n            learners. Referring to random forest.\n\n        train_loss : string, default=openasce.inference.tree.losses.BinaryCrossEntropy.\n            The spliting criterion used by GBCT. User can reimplement by inheriting base class `Loss`.\n\n        min_samples_split : int, default=10\n            The minimum number of samples required to split an internal node\n\n        max_depth : int, default=3\n            maximum depth of the individual regression estimators. The maximum\n            depth limits the number of nodes in the tree. Tune this parameter\n            for best performance; the best value depends on the interaction\n            of the input variables.\n\n        lambd : float, default=5\n            The regularization parameter of l2_penalty reference to XGB.\n\n        coef : float, default=1\n            The regularization parameter of selection bias, which refer to GBCT.\n\n        n_period (int, optional): The number of timesteps. It's required to be provied by user.\n\n        treat_dt (int, optional): The time step that treatment is assigned, which is less than n_period. It's\n            required to be provied by user.\n\n        init : The initial prediction. Default to 0.\n\n        random_state : int or RandomState, default=None\n            Controls the random seed given to each Tree estimator at each\n            boosting iteration.\n            In addition, it controls the random permutation of the features at\n            each split (see Notes for more details).\n\n        verbose : bool, default=False. Enable verbose output.\n\n        n_iter_no_change : int, default=10\n            ``n_iter_no_change`` is used to decide if early stopping will be used\n            to terminate training when validation score is not improving. By\n            default it is set to None to disable early stopping.\n\n        tol : float, default=1e-4\n            Tolerance for the early stopping. When the loss is not improving\n            by at least tol for ``n_iter_no_change`` iterations (if set to a\n            number), the training stops.\n\n        nthreads: The number of threads to use for parallelization. Default is 32.\n\n        conf (ConfigTree or dict, optional): _description_. Defaults to None.\n\n        bin_mapper (BinMapper, optional): _description_. Defaults to None.\n    "
__init__(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.BinaryCrossEntropy', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]
__module__ = 'openasce.inference.tree.gbct'
effect(X: ndarray, *, data: Dataset = None)[source]

predict the treatment effect.

Parameters
  • X – np.ndarray, [n_instances, n_features]. features

  • data – Dataset, optional. Defaults to None.

Returns

np.ndarray, [n_instances,]. treatment effect.

fit(X: ndarray, Y: ndarray, D: ndarray, *, data: Dataset = None)[source]

train the GradientBoostingUpliftTree

Parameters
  • X – np.ndarray, [n_instances, n_features]. features

  • Y – np.ndarray, [n_instances, n_period]. outcome

  • D – np.ndarray, [n_instances]. treatment.

  • data (Dataset, optional) – _description_. Defaults to None.

Returns:

predict(X: ndarray, key: str, *, data: Dataset = None)[source]

predict the treatment effect or leaf id, which is determined by the parameter key.

Parameters
  • X (np.ndarray) – features

  • key (str) – the key can be ‘effect’, ‘leaf_id’ or ‘effect-ND’.

  • data (Dataset, optional) – data. Defaults to None.

Returns

_description_

Return type

_type_

openasce.inference.tree.gradient_causal_tree module

class openasce.inference.tree.gradient_causal_tree.GradientDebiasedCausalTree(conf: ConfigTree = None, bin_mapper: BinMapper = None, **kwargs)[source]

Bases: object

GradientDebiasedCausalTree is a class that represents a gradient-based debiased causal tree model.

Parameters
  • conf (ConfigTree) – The configuration tree.

  • bin_mapper (BinMapper) – The BinMapper instance.

  • kwargs – Additional keyword arguments.

__dict__ = mappingproxy({'__module__': 'openasce.inference.tree.gradient_causal_tree', '__doc__': '\n    GradientDebiasedCausalTree is a class that represents a gradient-based debiased causal tree model.\n\n    Arguments:\n        conf (ConfigTree): The configuration tree.\n        bin_mapper (BinMapper): The BinMapper instance.\n        kwargs: Additional keyword arguments.\n\n    ', '__init__': <function GradientDebiasedCausalTree.__init__>, 'fit': <function GradientDebiasedCausalTree.fit>, 'updater': <function GradientDebiasedCausalTree.updater>, 'split': <function GradientDebiasedCausalTree.split>, '_split_cpp': <function GradientDebiasedCausalTree._split_cpp>, 'preprocess': <function GradientDebiasedCausalTree.preprocess>, 'export': <function GradientDebiasedCausalTree.export>, 'postprocess': <function GradientDebiasedCausalTree.postprocess>, '_predict': <function GradientDebiasedCausalTree._predict>, 'predict': <function GradientDebiasedCausalTree.predict>, 'gradients': <function GradientDebiasedCausalTree.gradients>, 'loss': <function GradientDebiasedCausalTree.loss>, '__dict__': <attribute '__dict__' of 'GradientDebiasedCausalTree' objects>, '__weakref__': <attribute '__weakref__' of 'GradientDebiasedCausalTree' objects>, '__annotations__': {}})
__doc__ = '\n    GradientDebiasedCausalTree is a class that represents a gradient-based debiased causal tree model.\n\n    Arguments:\n        conf (ConfigTree): The configuration tree.\n        bin_mapper (BinMapper): The BinMapper instance.\n        kwargs: Additional keyword arguments.\n\n    '
__init__(conf: ConfigTree = None, bin_mapper: BinMapper = None, **kwargs)[source]
__module__ = 'openasce.inference.tree.gradient_causal_tree'
__weakref__

list of weak references to the object (if defined)

_predict(nodes, x, key='effect', out=None)[source]

Internal method to predict using the tree nodes.

Parameters
  • nodes – The tree nodes.

  • x – The input data.

  • key – The prediction key.

  • out – The output array.

Returns

The predicted values.

Return type

ndarray

Raises

NotImplementedError – If the prediction key is not implemented.

_split_cpp(leaves: List[GradientCausalTreeNode], hist: Histogram)[source]

Split the tree nodes using C++ implementation.

Parameters
Returns

The split conditions.

Return type

Dict

export()[source]

Export the tree model.

Returns

The exported C++ nodes and Python nodes.

Return type

Tuple[List[DidNode], List[Dict]]

fit(gradients, cgradients, data: Dataset, eta=None)[source]

Fit the GradientDebiasedCausalTree model.

Parameters
  • gradients – The gradients.

  • cgradients – The counterfactal gradients.

  • data (Dataset) – The training dataset.

  • eta – The eta values.

Returns

None

gradients(target, prediction, **kwargs)[source]

Compute the gradients of the loss function.

Parameters
  • target – The target values.

  • prediction – The predicted values.

  • kwargs – Additional keyword arguments.

Returns

The gradients.

Return type

ndarray

loss(grad, hess, y_hat=None, **kwargs)[source]

Compute the loss function.

Parameters
  • grad – The gradients.

  • hess – The hessians.

  • y_hat – The predicted values.

  • kwargs – Additional keyword arguments.

Returns

The loss values.

Return type

ndarray

postprocess()[source]

Perform post-processing steps after fitting the tree.

Returns

None

predict(x, w=None, key='effect', out=None)[source]

Predict the treatment effect or other values.

Parameters
  • x – The input data.

  • w – The treatment weights.

  • key – The prediction key.

  • out – The output array.

Returns

The predicted values.

Return type

ndarray

preprocess(gradients, cgradients, tr_data: Dataset, eta=None, subsample=1, subfeature=1)[source]

Preprocess the data before fitting the tree.

Parameters
  • gradients – The gradients.

  • cgradients – The cgradients.

  • tr_data (Dataset) – The training dataset.

  • eta – The eta values.

  • subsample – The subsample ratio.

  • subfeature – The subfeature ratio.

Returns

The histogram and index.

Return type

Tuple[Histogram, ndarray]

split(leaves: List[GradientCausalTreeNode], hist: Histogram)[source]

Split the tree nodes.

Parameters
Returns

The split conditions.

Return type

Dict

updater(split_conds: Dict, gradients, cgradients, tr_data, hist: Histogram, idx_map, leaves: List[GradientCausalTreeNode], leaves_range, eta=None)[source]

Update the tree nodes.

Parameters
  • split_conds (Dict) – The split conditions.

  • gradients – The gradients.

  • cgradients – The cgradients.

  • tr_data (Dataset) – The training dataset.

  • hist (Histogram) – The histogram.

  • idx_map – The index map.

  • leaves (List[GradientCausalTreeNode]) – The list of tree nodes.

  • leaves_range – The range of each leaf.

  • eta – The eta values.

Returns

The updated tree nodes and updated leaf ranges.

Return type

Tuple[List[GradientCausalTreeNode], ndarray]

class openasce.inference.tree.gradient_causal_tree.GradientDiDCausalTree(conf: ConfigTree = None, bin_mapper: BinMapper = None, **kwargs)[source]

Bases: GradientDebiasedCausalTree

GradientDiDCausalTree is a class that represents a gradient-based debiased causal tree model with difference in differences. It inherits from the GradientDebiasedCausalTree class.

Parameters
  • conf (ConfigTree) – The configuration tree.

  • bin_mapper (BinMapper) – The BinMapper instance.

  • kwargs – Additional keyword arguments.

__annotations__ = {}
__doc__ = '\n    GradientDiDCausalTree is a class that represents a gradient-based debiased causal tree model with difference in\n    differences. It inherits from the GradientDebiasedCausalTree class.\n\n    Arguments:\n        conf (ConfigTree): The configuration tree.\n        bin_mapper (BinMapper): The BinMapper instance.\n        kwargs: Additional keyword arguments.\n    '
__module__ = 'openasce.inference.tree.gradient_causal_tree'
_predict(nodes, x, key='effect', out=None)[source]

Predicting for the given data using the exported nodes.

Parameters
  • nodes – The exported nodes.

  • x – The input data.

  • key – The key specifying the prediction type (default: ‘effect’).

  • out – The output array (default: None).

Returns

The predictions.

_split_cpp(leaves: List[GradientCausalTreeNode], hist: Histogram)[source]

Split the leaf nodes at the current level using C++ implementation.

Parameters
  • leaves – The list of leaf nodes.

  • hist – The histogram object.

Returns

The split conditions.

Return type

split_conds

export()[source]

Export the GradientDiDCausalTree.

Returns

The exported nodes in C++ object. slim_nodes: The exported nodes in python object.

Return type

slim_cppnodes

predict(x, w=None, key='effect', out=None)[source]

Predict the treatment effect or other outcomes for given data.

Parameters
  • x – The input data.

  • w – The treatment assignments (default: None).

  • key – The key specifying the prediction type (default: ‘effect’).

  • out – The output array (default: None).

Returns

The predicted values. cpred: The counterfactual predicted values. eta: The optimal parallel interval between the treated and control group.

Return type

pred

preprocess(gradients, cgradients, tr_data: Dataset, eta=None, subsample=1, subfeature=1)[source]

Preprocesses the data for the GradientDiDCausalTree model.

Parameters
  • gradients – The gradients.

  • cgradients – The counterfactual gradients.

  • tr_data (Dataset) – The training dataset.

  • eta – The parallel interval between the treated and control group.

  • subsample (float) – The subsampling ratio for instances (default: 1).

  • subfeature (float) – The subsampling ratio for features (default: 1).

Returns

The constructed histogram. index (ndarray): The permutation index.

Return type

hist (Histogram)

split(leaves: List[GradientCausalTreeNode], hist: Histogram)[source]

Split the leaf nodes.

Parameters
  • leaves – The list of leaf nodes.

  • hist – The histogram object.

Returns

The split conditions.

Return type

split_conds

updater(split_conds: Dict, gradients, cgradients, tr_data, hist: Histogram, idx_map, leaves: List[GradientCausalTreeNode], leaves_range, eta)[source]

Update the GradientCausalTree by performing splitting and updating histograms.

Parameters
  • split_conds (Dict) – The split conditions.

  • gradients – The gradients.

  • cgradients – The counterfactual gradients.

  • tr_data – The training dataset.

  • hist (Histogram) – The histogram object.

  • idx_map – The index mapping.

  • leaves (List[GradientCausalTreeNode]) – The list of leaves.

  • leaves_range – The range of leaves.

  • eta – The parallel interval between the treated and control group.

Returns

The new leaves. leaves_range_new: The new range of leaves.

Return type

leaves_new (List[GradientCausalTreeNode])

openasce.inference.tree.gradient_causal_tree._filter(leaves, leaves_range)[source]

openasce.inference.tree.histogram module

class openasce.inference.tree.histogram.Histogram(conf: ConfigTree)[source]

Bases: object

__dict__ = mappingproxy({'__module__': 'openasce.inference.tree.histogram', '__init__': <function Histogram.__init__>, 'update_hists': <function Histogram.update_hists>, '__getattr__': <function Histogram.__getattr__>, 'new_instance': <classmethod(<function Histogram.new_instance>)>, '__dict__': <attribute '__dict__' of 'Histogram' objects>, '__weakref__': <attribute '__weakref__' of 'Histogram' objects>, '__doc__': None, '__annotations__': {}})
__doc__ = None
__getattr__(_Histogram__name: str)[source]

Get the attribute value.

Parameters

__name (str) – The name of the attribute.

Returns

The attribute value.

Return type

ndarray

Raises

AttributeError – If the attribute is not found.

__init__(conf: ConfigTree)[source]
__module__ = 'openasce.inference.tree.histogram'
__weakref__

list of weak references to the object (if defined)

classmethod new_instance(dataset: Dataset, conf: ConfigTree = None, **kwargs)[source]

Create a new instance of the histogram.

Parameters
  • dataset (Dataset) – The dataset.

  • conf (ConfigTree) – The configuration tree.

  • kwargs – Additional keyword arguments.

Returns

The new instance of the histogram.

Return type

Histogram

update_hists(target, index, leaves_range, treatment, bin_features, is_gradient, is_splitting, threads)[source]

Update histograms for all nodes in the same level of a tree

Parameters
  • target (_type_) – _description_

  • index (_type_) – _description_

  • leaves_range (_type_) – _description_

  • treatment (_type_) – _description_

  • bin_features (_type_) – _description_

  • is_gradient (bool) – _description_

  • is_splitting (bool) – _description_

  • threads (_type_) – _description_

Raises

ValueError – _description_

Returns

_description_

Return type

_type_

openasce.inference.tree.histogram_test module

openasce.inference.tree.information module

class openasce.inference.tree.information.CausalDataInfo(conf, **kwargs)[source]

Bases: object

__dict__ = mappingproxy({'__module__': 'openasce.inference.tree.information', '__init__': <function CausalDataInfo.__init__>, '__dict__': <attribute '__dict__' of 'CausalDataInfo' objects>, '__weakref__': <attribute '__weakref__' of 'CausalDataInfo' objects>, '__doc__': None, '__annotations__': {}})
__doc__ = None
__init__(conf, **kwargs)[source]
__module__ = 'openasce.inference.tree.information'
__weakref__

list of weak references to the object (if defined)

openasce.inference.tree.losses module

class openasce.inference.tree.losses.BinaryCrossEntropy(**kwargs)[source]

Bases: GradLoss

__abstractmethods__ = frozenset({})
__doc__ = None
__init__(**kwargs)[source]
__module__ = 'openasce.inference.tree.losses'
_abc_impl = <_abc._abc_data object>
property const_hess

Check if the hessian is constant.

Returns

True if the hessian is constant, False otherwise.

gradient(target, prediction)[source]

Compute the gradient: gradient = prediction - target, where prediction is the positive probability.

Parameters
  • target – The target values.

  • prediction – The predicted probabilities.

Returns

The computed gradients.

Return type

ndarray

gradients(target, logit, treatment)[source]

Calculate gradient and hessian

Parameters
  • target (DataFrame) – [description]

  • prediction (DataFrame) – [description]

  • treatment (DataFrame) – [description]

Returns

[description]

Return type

Union[Tuple, None]

hessian(target, prediction)[source]

Compute the hessian: hessian = prediction * (1 - prediction).

Parameters
  • target – The target values.

  • prediction – The predicted probabilities.

Returns

The computed hessians.

Return type

ndarray

loss(target, prediction, logit=True)[source]

Calculate the cross entropy

Parameters
  • target – ground-truth label

  • prediction – prediction of logits

  • logit (bool, optional) – [description]. Defaults to True.

class openasce.inference.tree.losses.GradLoss(**kwargs)[source]

Bases: Loss

Abstract base class for gradient-based loss functions.

__abstractmethods__ = frozenset({'gradient', 'gradients', 'hessian', 'loss'})
__annotations__ = {}
__doc__ = 'Abstract base class for gradient-based loss functions.'
__module__ = 'openasce.inference.tree.losses'
_abc_impl = <_abc._abc_data object>
property const_hess

Check if the hessian is constant.

Returns

True if the hessian is constant, False otherwise.

abstract gradient(target, prediction)[source]

Calculate the gradient of the loss.

Parameters
  • target – Target values.

  • prediction – Predicted values.

Raises

NotImplementedError – If the method is not implemented.

Returns

The gradient.

abstract gradients(target, prediction) Tuple[source]

Calculate the gradients and hessians.

Parameters
  • target – Target values.

  • prediction – Predicted values.

Raises

NotImplementedError – If the method is not implemented.

Returns

Tuple containing the gradients and hessians.

abstract hessian(target, prediction)[source]

Calculate the hessian of the loss.

Parameters
  • target – Target values.

  • prediction – Predicted values.

Raises

NotImplementedError – If the method is not implemented.

Returns

The hessian.

class openasce.inference.tree.losses.Loss(**kwargs)[source]

Bases: object

Abstract base class for loss functions.

__abstractmethods__ = frozenset({'loss'})
__annotations__ = {}
__dict__ = mappingproxy({'__module__': 'openasce.inference.tree.losses', '__doc__': 'Abstract base class for loss functions.', '__init__': <function Loss.__init__>, 'new_instance': <staticmethod(<function Loss.new_instance>)>, 'loss': <function Loss.loss>, '__dict__': <attribute '__dict__' of 'Loss' objects>, '__weakref__': <attribute '__weakref__' of 'Loss' objects>, '__abstractmethods__': frozenset({'loss'}), '_abc_impl': <_abc._abc_data object>, '__annotations__': {}})
__doc__ = 'Abstract base class for loss functions.'
__init__(**kwargs)[source]
__module__ = 'openasce.inference.tree.losses'
__weakref__

list of weak references to the object (if defined)

_abc_impl = <_abc._abc_data object>
abstract loss(target, prediction, *args)[source]

Calculate the loss.

Parameters
  • target – Target values.

  • prediction – Predicted values.

  • args – Additional arguments.

Raises

NotImplementedError – If the method is not implemented.

Returns

The loss value.

static new_instance(conf)[source]

Create a new instance of the loss function.

Parameters

conf – Configuration.

Returns

An instance of the loss function.

class openasce.inference.tree.losses.MeanSquaredError(**kwargs)[source]

Bases: GradLoss

__abstractmethods__ = frozenset({})
__annotations__ = {}
__doc__ = None
__init__(**kwargs)[source]
__module__ = 'openasce.inference.tree.losses'
_abc_impl = <_abc._abc_data object>
property const_hess

Check if the hessian is constant.

Returns

True if the hessian is constant, False otherwise.

gradient(target, prediction, **kwargs)[source]

Calculate the gradient of the loss.

Parameters
  • target – Target values.

  • prediction – Predicted values.

Raises

NotImplementedError – If the method is not implemented.

Returns

The gradient.

gradients(target, prediction, **kwargs) Tuple[source]

Calculate the gradients and hessians.

Parameters
  • target – Target values.

  • prediction – Predicted values.

Raises

NotImplementedError – If the method is not implemented.

Returns

Tuple containing the gradients and hessians.

hessian(target, prediction, **kwargs)[source]

Calculate the hessian of the loss.

Parameters
  • target – Target values.

  • prediction – Predicted values.

Raises

NotImplementedError – If the method is not implemented.

Returns

The hessian.

loss(target, prediction, *args, **kwargs)[source]

The mean squared loss

Parameters
  • y – [n_instance, n_outcome]

  • y_hat – [n_instance, n_outcome] or [n_outcome]

Raises

ValueError – _description_

Returns

_description_

Return type

_type_

openasce.inference.tree.losses.sigmoid(x)[source]

openasce.inference.tree.reflect_utils module

openasce.inference.tree.reflect_utils.get_class(module_class)[source]

Get the class using import_module

Parameters

module_class – module class name, full path a.b.class_name

openasce.inference.tree.reflect_utils.get_class_defined_in_module(module_name, clazz)[source]
openasce.inference.tree.reflect_utils.get_object_defined_in_module(module_name, clazz, name=None)[source]
openasce.inference.tree.reflect_utils.new_instance(module_class, *args, **kwargs)[source]

Create a new instance using import_module

Parameters
  • module_class – module class name, full path a.b.class_name

  • args – passed to the constructor of class

  • kwargs – passed to the constructor of class

openasce.inference.tree.splitting_losses module

openasce.inference.tree.splitting_losses.causal_tree_splitting_losses(configs, bin_outcome_hist, bin_counts, parameters: dict)[source]

Calculate the splitting losses for the ordinary causal tree.

Parameters
  • configs – Configuration.

  • bin_outcome_hist – Histogram of outcome values.

  • bin_counts – Histogram counts.

  • parameters – Additional parameters.

Returns

The splitting losses.

openasce.inference.tree.splitting_losses.causal_tree_splitting_losses2(configs, bin_grad_hist, bin_hess_hist, bin_counts, parameters: dict)[source]

Calculate the splitting losses for the ordinary causal tree.

Parameters
  • configs – Configuration.

  • bin_outcome_hist – Histogram of outcome values.

  • bin_counts – Histogram counts.

  • parameters – Additional parameters.

Returns

The splitting losses.

openasce.inference.tree.splitting_losses.didtree_splitting_losses(configs, bin_grad_hist, bin_hess_hist, bin_cgrad_hist, bin_chess_hist, bin_eta_hist, bin_counts, parameters: dict)[source]

Calculate the splitting losses for the DiD-Tree model.

Parameters
  • configs – Configuration.

  • bin_grad_hist – Histogram of gradients.

  • bin_hess_hist – Histogram of Hessians.

  • bin_cgrad_hist – Histogram of cumulative gradients.

  • bin_chess_hist – Histogram of cumulative Hessians.

  • bin_eta_hist – Histogram of etas.

  • bin_counts – Histogram counts.

  • parameters – Additional parameters.

Returns

The splitting losses.

openasce.inference.tree.splitting_losses.gbct_splitting_losses(configs, bin_grad_hist, bin_hess_hist, bin_cgrad_hist, bin_chess_hist, bin_counts, parameters: dict)[source]

Calculate the splitting losses for the GBCT model.

Parameters
  • configs – Configuration.

  • bin_grad_hist – Histogram of gradients.

  • bin_hess_hist – Histogram of Hessians.

  • bin_cgrad_hist – Histogram of cumulative gradients.

  • bin_chess_hist – Histogram of cumulative Hessians.

  • bin_counts – Histogram counts.

  • parameters – Additional parameters.

Returns

The splitting losses.

openasce.inference.tree.tree_node module

class openasce.inference.tree.tree_node.CausalTreeNode(conf: ConfigTree = None, **kwargs)[source]

Bases: object

A class for a node in a Causal Tree maximizing heterogenous treatment effect.

__dict__ = mappingproxy({'__module__': 'openasce.inference.tree.tree_node', '__doc__': 'A class for a node in a Causal Tree maximizing heterogenous treatment effect.', '__init__': <function CausalTreeNode.__init__>, 'estimate': <function CausalTreeNode.estimate>, 'estimate_by_hist': <function CausalTreeNode.estimate_by_hist>, 'children': <property object>, 'effect': <function CausalTreeNode.effect>, '__dict__': <attribute '__dict__' of 'CausalTreeNode' objects>, '__weakref__': <attribute '__weakref__' of 'CausalTreeNode' objects>, '__annotations__': {'leaf_id': 'int', 'level_id': 'int', 'op_loss': 'Loss'}})
__doc__ = 'A class for a node in a Causal Tree maximizing heterogenous treatment effect.'
__init__(conf: ConfigTree = None, **kwargs)[source]
__module__ = 'openasce.inference.tree.tree_node'
__weakref__

list of weak references to the object (if defined)

property children
effect(w)[source]

Compute the treatment effect.

Parameters

w – The treatment weights.

Returns

The computed treatment effect.

Return type

ndarray

estimate(outcome: ndarray, treatment: ndarray, weight: ndarray = None)[source]

Estimate the treatment effect given the outcome, treatment, and weight.

Parameters
  • outcome – The outcome values.

  • treatment – The treatment values.

  • weight – The weight values.

Returns

The estimated treatment effect.

Return type

ndarray

estimate_by_hist(outcome: ndarray, treatment: ndarray, count: ndarray)[source]

Estimate the treatment effect using histogram-based method.

Parameters
  • outcome – The outcome values.

  • treatment – The treatment values.

  • count – The count values.

Returns

The estimated treatment effect.

Return type

ndarray

class openasce.inference.tree.tree_node.GradientCausalTreeNode(conf: ConfigTree = None, **kwargs)[source]

Bases: CausalTreeNode

A class for a node in a Gradient Boosting Causal Tree.

__annotations__ = {}
__doc__ = 'A class for a node in a Gradient Boosting Causal Tree.'
__init__(conf: ConfigTree = None, **kwargs)[source]
__module__ = 'openasce.inference.tree.tree_node'
estimate(G, H, **kwargs)[source]

Estimate the treatment effect given the gradients and hessians.

Parameters
  • G – The gradients.

  • H – The hessians.

  • **kwargs – Additional keyword arguments.

Returns

The estimated treatment effect.

Return type

ndarray

openasce.inference.tree.utils module

openasce.inference.tree.utils.DEBUG(msg, *args, **kwargs)[source]
openasce.inference.tree.utils.ERROR(msg, *args, **kwargs)[source]
openasce.inference.tree.utils.FATAL(msg, *args, **kwargs)[source]
openasce.inference.tree.utils.INFO(msg, *args, **kwargs)[source]
openasce.inference.tree.utils.TRACE(msg, *args, **kwargs)[source]
openasce.inference.tree.utils.WARN(msg, *args, **kwargs)[source]
openasce.inference.tree.utils._check_c_style_array(*args)[source]
openasce.inference.tree.utils._check_match(*args, axis=0)[source]
openasce.inference.tree.utils.find_bin_parallel(data, max_bin=64, min_data_in_bin=100, min_split_data=100, pre_filter=False, bin_type=0, use_missing=True, zero_as_missing=False, forced_upper_bounds=[])[source]

Find bins in parallel for the given data.

Parameters
  • data – The input data.

  • max_bin – The maximum number of bins. (default: 64)

  • min_data_in_bin – The minimum number of data points in a bin. (default: 100)

  • min_split_data – The minimum number of data points to split a bin. (default: 100)

  • pre_filter – Whether to pre-filter the data. (default: False)

  • bin_type – The type of binning. (default: 0)

  • use_missing – Whether to use missing values. (default: True)

  • zero_as_missing – Whether to treat zero as a missing value. (default: False)

  • forced_upper_bounds – The forced upper bounds for the bins. (default: [])

Returns

The bins found.

Raises

ValueError – If the data type is not supported.

openasce.inference.tree.utils.groupby(data: ndarray, by: ndarray, aggregator: str = 'mean', dropna=True)[source]
openasce.inference.tree.utils.indexbyarray(arr, idx, fact_outcome, counterfact_outcome=None, n_threads=- 1)[source]

Index an outcome array (arr with shape [n, 2]) by another binary treament array (idx)

Parameters
  • arr (ndarray) – The input array.

  • idx (ndarray) – The index array.

  • fact_outcome (ndarray) – The outcome array to update.

  • counterfact_outcome (ndarray) – The counterfactual outcome array to update.

  • n_threads (int, optional) – The number of threads to use. Defaults to -1.

Returns

The updated outcome arrays.

Return type

ndarray

openasce.inference.tree.utils.init_logger()[source]
openasce.inference.tree.utils.list_to_array(data: list, out=None, st_idx: int = 0, miss_value=0, threads=- 1)[source]
openasce.inference.tree.utils.set_log_level_cpp(level)[source]
openasce.inference.tree.utils.t_or_f(arg)[source]
openasce.inference.tree.utils.to_row_major(x, dtype=None)[source]
openasce.inference.tree.utils.update_histogram(target, x_binned, index, leaves_range, treatment, out, leaves=[], n_treatment=2, n_bins=64, threads=- 1)[source]

Update the histogram of each leaf.

Parameters
  • target (ndarray) – Target array. Shape [n, n_outcome].

  • x_binned (ndarray) – Binned feature array. Shape [n, n_feature].

  • index (ndarray) – Index array. Shape [n]. The end position in leaves_range must not exceed n.

  • leaves_range (ndarray) – List of each leaf’s data range. Shape [n_leaf, 2]. Each term looks like [st_pos, end_pos).

  • treatment (ndarray) – Treatment array. Shape [n].

  • out (ndarray) – Output histogram. Shape [n_leaf, n_features, n_bins, n_treatment, n_outcome].

  • leaves (list, optional) – List of leaf indices. Defaults to [].

  • n_treatment (int, optional) – The number of treatments. Defaults to 2.

  • n_bins (int, optional) – The number of bins. Defaults to 64.

  • threads (int, optional) – The number of threads to use. Defaults to -1.

Returns

The updated histogram array.

Return type

ndarray

openasce.inference.tree.utils.update_histograms(targets, x_binned, index, leaves_range, treatment, outs, leaves=[], n_treatment=2, n_bins=64, threads=- 1)[source]

Update the histogram of each leaf.

Parameters
  • targets (list) – List of target arrays. Shape [n, n_outcome].

  • x_binned (ndarray) – Binned feature array. Shape [n, n_feature].

  • index (ndarray) – Index array. Shape [n]. Must satisfy that the end position in leaves_range is not greater than n.

  • leaves_range (ndarray) – List of each leaf’s data range. Shape [n_leaf, 2]. Each term looks like [st_pos, end_pos).

  • treatment (ndarray) – Treatment array. Shape [n].

  • outs (list) – List of output histogram arrays. Shape [n_leaf, n_features, n_bins, n_treatment, n_outcome].

  • leaves (list, optional) – List of leaf indices. Defaults to [].

  • n_treatment (int, optional) – The number of treatments. Defaults to 2.

  • n_bins (int, optional) – The number of bins. Defaults to 64.

  • threads (int, optional) – The number of threads to use. Defaults to -1.

Returns

The updated histogram arrays.

Return type

ndarray

openasce.inference.tree.utils.update_x_map(x_binned, ins2leaf, split_infos, leaves_range, out, nthread=- 1)[source]

Update the index of instances

Parameters
  • x_binned (ndarray) – The binned feature array.

  • ins2leaf (ndarray) – The mapping array.

  • split_infos (ndarray) – The split information array.

  • leaves_range (ndarray) – The range of each leaf.

  • out (ndarray) – The output array to store the updated mapping.

  • nthread (int, optional) – The number of threads to use. Defaults to -1.

Returns

None

openasce.inference.tree.utils.value_bin_parallel(data, bin_mappers: List[openasce.inference.tree.gbct_utils.bin.BinMaper], out=None, threads=- 1)[source]

Transform the input data to bin values in parallel.

Parameters
  • data – The input data.

  • bin_mappers – The bin mappers.

  • out – The output array to store the bin values. (default: None)

  • threads – The number of threads to use (-1 for maximum). (default: -1)

Returns

The transformed bin values.

Raises

ValueError – If the output dtype is not supported.

openasce.inference.tree.utils_test module