openasce.inference.tree package¶
- class openasce.inference.tree.DifferenceInDifferencesRegressionTree(n_estimators=100, learning_rate=0.1, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, parallel_l2=0, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: ConfigTree = None, bin_mapper: BinMapper = None)[source]¶
Bases:
BoostingGradient Boosting debiased Causal Tree for regression, Difference-in-differences meets tree-based methods: heterogeneous treatment effects estimation with unmeasured confounding
Reference: 1. Tang, C., Wang, H., Li, X., Qing, C., Li, L., & Zhou, J. (2023). Difference-in-Differences Meets Tree-Based Methods: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Proceedings of the 40th International Conference on Machine Learning.
- Parameters
n_estimators – int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
learning_rate – float, default=0.1 learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.
subsample – float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.
subfeature – float, default=1.0 The fraction of feature to be used for fitting the individual base learners. Referring to random forest.
train_loss – string, default=openasce.inference.tree.losses.MeanSquaredError. The spliting criterion used by GBCT. User can reimplement by inheriting base class Loss.
min_samples_split – int, default=10 The minimum number of samples required to split an internal node
max_depth – int, default=3 maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.
lambd – float, default=5 The regularization parameter of l2_penalty reference to XGB.
coef – float, default=1 The regularization parameter of selection bias, which refer to GBCT.
parallel_l2 – float, default=0
n_period (int, optional) – The number of timesteps. It’s required to be provied by user.
treat_dt (int, optional) – The time step that treatment is assigned, which is less than n_period. It’s required to be provied by user.
init – The initial prediction. Default to 0.
random_state – int or RandomState, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details).
verbose – bool, default=False. Enable verbose output.
n_iter_no_change – int, default=40
n_iter_no_changeis used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping.tol – float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol for
n_iter_no_changeiterations (if set to a number), the training stops.nthreads – The number of threads to use for parallelization. Default is 32.
conf (ConfigTree, optional) – _description_. Defaults to None.
bin_mapper (BinMapper, optional) – _description_. Defaults to None.
- __doc__ = "\n Gradient Boosting debiased Causal Tree for regression,\n Difference-in-differences meets tree-based methods: heterogeneous treatment effects estimation with unmeasured confounding\n\n Reference:\n 1. Tang, C., Wang, H., Li, X., Qing, C., Li, L., & Zhou, J. (2023). Difference-in-Differences Meets Tree-Based Methods: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Proceedings of the 40th International Conference on Machine Learning.\n\n\n Arguments:\n n_estimators : int, default=100\n The number of boosting stages to perform. Gradient boosting\n is fairly robust to over-fitting so a large number usually\n results in better performance.\n\n learning_rate : float, default=0.1\n learning rate shrinks the contribution of each tree by `learning_rate`.\n There is a trade-off between learning_rate and n_estimators.\n\n subsample : float, default=1.0\n The fraction of samples to be used for fitting the individual base\n learners. If smaller than 1.0 this results in Stochastic Gradient\n Boosting. `subsample` interacts with the parameter `n_estimators`.\n Choosing `subsample < 1.0` leads to a reduction of variance\n and an increase in bias.\n\n subfeature : float, default=1.0\n The fraction of feature to be used for fitting the individual base\n learners. Referring to random forest.\n\n train_loss : string, default=openasce.inference.tree.losses.MeanSquaredError.\n The spliting criterion used by GBCT. User can reimplement by inheriting base class `Loss`.\n\n min_samples_split : int, default=10\n The minimum number of samples required to split an internal node\n\n max_depth : int, default=3\n maximum depth of the individual regression estimators. The maximum\n depth limits the number of nodes in the tree. Tune this parameter\n for best performance; the best value depends on the interaction\n of the input variables.\n\n lambd : float, default=5\n The regularization parameter of l2_penalty reference to XGB.\n\n coef : float, default=1\n The regularization parameter of selection bias, which refer to GBCT.\n\n parallel_l2: float, default=0\n\n n_period (int, optional): The number of timesteps. It's required to be provied by user.\n\n treat_dt (int, optional): The time step that treatment is assigned, which is less than n_period. It's\n required to be provied by user.\n\n init : The initial prediction. Default to 0.\n\n random_state : int or RandomState, default=None\n Controls the random seed given to each Tree estimator at each\n boosting iteration.\n In addition, it controls the random permutation of the features at\n each split (see Notes for more details).\n\n verbose : bool, default=False. Enable verbose output.\n\n n_iter_no_change : int, default=40\n ``n_iter_no_change`` is used to decide if early stopping will be used\n to terminate training when validation score is not improving. By\n default it is set to None to disable early stopping.\n\n tol : float, default=1e-4\n Tolerance for the early stopping. When the loss is not improving\n by at least tol for ``n_iter_no_change`` iterations (if set to a\n number), the training stops.\n\n nthreads: The number of threads to use for parallelization. Default is 32.\n\n conf (ConfigTree, optional): _description_. Defaults to None.\n\n bin_mapper (BinMapper, optional): _description_. Defaults to None.\n "¶
- __init__(n_estimators=100, learning_rate=0.1, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, parallel_l2=0, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: ConfigTree = None, bin_mapper: BinMapper = None)[source]¶
- __module__ = 'openasce.inference.tree.didtree'¶
- effect(X: ndarray = None, *, data: Dataset = None)[source]¶
predict the treatment effect.
- Parameters
X – np.ndarray, [n_instances, n_features]. features
data – Dataset, optional. Defaults to None.
- Returns
np.ndarray, [n_instances,]. treatment effect.
- fit(X: ndarray = None, Y: ndarray = None, D: ndarray = None, *, data: Dataset = None, data_test: Dataset = None)[source]¶
train the GradientBoostingUpliftTree
- Parameters
X – np.ndarray, [n_instances, n_features]. features
Y – np.ndarray, [n_instances, n_period]. outcome
D – np.ndarray, [n_instances,]. treatment.
data (Dataset, optional) – _description_. Defaults to None.
Returns:
- predict(X: ndarray = None, key: str = 'effect', *, data: Dataset = None)[source]¶
predict the treatment effect or leaf id, which is determined by the parameter key.
- Parameters
X (np.ndarray) – features
key (str) – the key can be ‘effect’, ‘leaf_id’ or ‘outcome’.
data (Dataset, optional) – dataset. Defaults to None.
- Returns
_description_
- Return type
_type_
- class openasce.inference.tree.GradientBoostingCausalRegressionTree(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]¶
Bases:
BoostingGradient Boosting debiased Causal Tree for regression, This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.
Reference: 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information Processing Systems 36, 16.
- Parameters
learning_rate – float, default=0.1 learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.
n_estimators – int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
subsample – float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.
subfeature – float, default=1.0 The fraction of feature to be used for fitting the individual base learners. Referring to random forest.
train_loss – string, default=openasce.inference.tree.losses.MeanSquaredError. The spliting criterion used by GBCT. User can reimplement by inheriting base class Loss.
min_samples_split – int, default=10 The minimum number of samples required to split an internal node
max_depth – int, default=3 maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.
lambd – float, default=5 The regularization parameter of l2_penalty reference to XGB.
coef – float, default=1 The regularization parameter of selection bias, which refer to GBCT.
n_period (int, optional) – The number of timesteps. It’s required to be provied by user.
treat_dt (int, optional) – The time step that treatment is assigned, which is less than n_period. It’s required to be provied by user.
init – The initial prediction. Default to 0.
random_state – int or RandomState, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details).
verbose – bool, default=False. Enable verbose output.
n_iter_no_change – int, default=10
n_iter_no_changeis used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping.tol – float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol for
n_iter_no_changeiterations (if set to a number), the training stops.nthreads – The number of threads to use for parallelization. Default is 32.
conf (ConfigTree or dict, optional) – _description_. Defaults to None.
bin_mapper (BinMapper, optional) – _description_. Defaults to None.
- __annotations__ = {}¶
- __doc__ = "\n Gradient Boosting debiased Causal Tree for regression,\n This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.\n\n Reference:\n 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree:\n Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information\n Processing Systems 36, 16.\n\n Arguments:\n learning_rate : float, default=0.1\n learning rate shrinks the contribution of each tree by `learning_rate`.\n There is a trade-off between learning_rate and n_estimators.\n\n\n n_estimators : int, default=100\n The number of boosting stages to perform. Gradient boosting\n is fairly robust to over-fitting so a large number usually\n results in better performance.\n\n subsample : float, default=1.0\n The fraction of samples to be used for fitting the individual base\n learners. If smaller than 1.0 this results in Stochastic Gradient\n Boosting. `subsample` interacts with the parameter `n_estimators`.\n Choosing `subsample < 1.0` leads to a reduction of variance\n and an increase in bias.\n\n subfeature : float, default=1.0\n The fraction of feature to be used for fitting the individual base\n learners. Referring to random forest.\n\n train_loss : string, default=openasce.inference.tree.losses.MeanSquaredError.\n The spliting criterion used by GBCT. User can reimplement by inheriting base class `Loss`.\n\n min_samples_split : int, default=10\n The minimum number of samples required to split an internal node\n\n max_depth : int, default=3\n maximum depth of the individual regression estimators. The maximum\n depth limits the number of nodes in the tree. Tune this parameter\n for best performance; the best value depends on the interaction\n of the input variables.\n\n lambd : float, default=5\n The regularization parameter of l2_penalty reference to XGB.\n\n coef : float, default=1\n The regularization parameter of selection bias, which refer to GBCT.\n\n n_period (int, optional): The number of timesteps. It's required to be provied by user.\n\n treat_dt (int, optional): The time step that treatment is assigned, which is less than n_period. It's\n required to be provied by user.\n\n init : The initial prediction. Default to 0.\n\n random_state : int or RandomState, default=None\n Controls the random seed given to each Tree estimator at each\n boosting iteration.\n In addition, it controls the random permutation of the features at\n each split (see Notes for more details).\n\n verbose : bool, default=False. Enable verbose output.\n\n n_iter_no_change : int, default=10\n ``n_iter_no_change`` is used to decide if early stopping will be used\n to terminate training when validation score is not improving. By\n default it is set to None to disable early stopping.\n\n tol : float, default=1e-4\n Tolerance for the early stopping. When the loss is not improving\n by at least tol for ``n_iter_no_change`` iterations (if set to a\n number), the training stops.\n\n nthreads: The number of threads to use for parallelization. Default is 32.\n\n conf (ConfigTree or dict, optional): _description_. Defaults to None.\n\n bin_mapper (BinMapper, optional): _description_. Defaults to None.\n "¶
- __init__(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]¶
- __module__ = 'openasce.inference.tree.gbct'¶
- effect(X: ndarray, *, data: Dataset = None)[source]¶
predict the treatment effect.
- Parameters
X – np.ndarray, [n_instances, n_features]. features
data – Dataset, optional. Defaults to None.
- Returns
np.ndarray, [n_instances,]. treatment effect.
- fit(X: ndarray, Y: ndarray, D: ndarray, *, data: Dataset = None)[source]¶
train the GradientBoostingUpliftTree
- Parameters
X – np.ndarray, [n_instances, n_features]. features
Y – np.ndarray, [n_instances, n_period]. outcome
D – np.ndarray, [n_instances,]. treatment.
data (Dataset, optional) – _description_. Defaults to None.
Returns:
- predict(X: ndarray, key: str, *, data: Dataset = None)[source]¶
predict the treatment effect or leaf id, which is determined by the parameter key.
- Parameters
X (np.ndarray) – features
key (str) – the key can be ‘effect’, ‘leaf_id’ or ‘effect-ND’.
data (Dataset, optional) – dataset. Defaults to None.
- Returns
_description_
- Return type
_type_
- class openasce.inference.tree.GradientBoostingUpliftTree(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.BinaryCrossEntropy', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]¶
Bases:
BoostingGradient Boosting debiased Causal Tree for classification, This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.
Reference: 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information Processing Systems 36, 16.
- Parameters
learning_rate – float, default=0.1 learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.
n_estimators – int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
subsample – float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.
subfeature – float, default=1.0 The fraction of feature to be used for fitting the individual base learners. Referring to random forest.
train_loss – string, default=openasce.inference.tree.losses.BinaryCrossEntropy. The spliting criterion used by GBCT. User can reimplement by inheriting base class Loss.
min_samples_split – int, default=10 The minimum number of samples required to split an internal node
max_depth – int, default=3 maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.
lambd – float, default=5 The regularization parameter of l2_penalty reference to XGB.
coef – float, default=1 The regularization parameter of selection bias, which refer to GBCT.
n_period (int, optional) – The number of timesteps. It’s required to be provied by user.
treat_dt (int, optional) – The time step that treatment is assigned, which is less than n_period. It’s required to be provied by user.
init – The initial prediction. Default to 0.
random_state – int or RandomState, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details).
verbose – bool, default=False. Enable verbose output.
n_iter_no_change – int, default=10
n_iter_no_changeis used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping.tol – float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol for
n_iter_no_changeiterations (if set to a number), the training stops.nthreads – The number of threads to use for parallelization. Default is 32.
conf (ConfigTree or dict, optional) – _description_. Defaults to None.
bin_mapper (BinMapper, optional) – _description_. Defaults to None.
- __annotations__ = {}¶
- __doc__ = "\n Gradient Boosting debiased Causal Tree for classification,\n This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.\n\n Reference:\n 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree:\n Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information\n Processing Systems 36, 16.\n\n Arguments:\n learning_rate : float, default=0.1\n learning rate shrinks the contribution of each tree by `learning_rate`.\n There is a trade-off between learning_rate and n_estimators.\n\n\n n_estimators : int, default=100\n The number of boosting stages to perform. Gradient boosting\n is fairly robust to over-fitting so a large number usually\n results in better performance.\n\n subsample : float, default=1.0\n The fraction of samples to be used for fitting the individual base\n learners. If smaller than 1.0 this results in Stochastic Gradient\n Boosting. `subsample` interacts with the parameter `n_estimators`.\n Choosing `subsample < 1.0` leads to a reduction of variance\n and an increase in bias.\n\n subfeature : float, default=1.0\n The fraction of feature to be used for fitting the individual base\n learners. Referring to random forest.\n\n train_loss : string, default=openasce.inference.tree.losses.BinaryCrossEntropy.\n The spliting criterion used by GBCT. User can reimplement by inheriting base class `Loss`.\n\n min_samples_split : int, default=10\n The minimum number of samples required to split an internal node\n\n max_depth : int, default=3\n maximum depth of the individual regression estimators. The maximum\n depth limits the number of nodes in the tree. Tune this parameter\n for best performance; the best value depends on the interaction\n of the input variables.\n\n lambd : float, default=5\n The regularization parameter of l2_penalty reference to XGB.\n\n coef : float, default=1\n The regularization parameter of selection bias, which refer to GBCT.\n\n n_period (int, optional): The number of timesteps. It's required to be provied by user.\n\n treat_dt (int, optional): The time step that treatment is assigned, which is less than n_period. It's\n required to be provied by user.\n\n init : The initial prediction. Default to 0.\n\n random_state : int or RandomState, default=None\n Controls the random seed given to each Tree estimator at each\n boosting iteration.\n In addition, it controls the random permutation of the features at\n each split (see Notes for more details).\n\n verbose : bool, default=False. Enable verbose output.\n\n n_iter_no_change : int, default=10\n ``n_iter_no_change`` is used to decide if early stopping will be used\n to terminate training when validation score is not improving. By\n default it is set to None to disable early stopping.\n\n tol : float, default=1e-4\n Tolerance for the early stopping. When the loss is not improving\n by at least tol for ``n_iter_no_change`` iterations (if set to a\n number), the training stops.\n\n nthreads: The number of threads to use for parallelization. Default is 32.\n\n conf (ConfigTree or dict, optional): _description_. Defaults to None.\n\n bin_mapper (BinMapper, optional): _description_. Defaults to None.\n "¶
- __init__(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.BinaryCrossEntropy', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]¶
- __module__ = 'openasce.inference.tree.gbct'¶
- effect(X: ndarray, *, data: Dataset = None)[source]¶
predict the treatment effect.
- Parameters
X – np.ndarray, [n_instances, n_features]. features
data – Dataset, optional. Defaults to None.
- Returns
np.ndarray, [n_instances,]. treatment effect.
- fit(X: ndarray, Y: ndarray, D: ndarray, *, data: Dataset = None)[source]¶
train the GradientBoostingUpliftTree
- Parameters
X – np.ndarray, [n_instances, n_features]. features
Y – np.ndarray, [n_instances, n_period]. outcome
D – np.ndarray, [n_instances]. treatment.
data (Dataset, optional) – _description_. Defaults to None.
Returns:
- predict(X: ndarray, key: str, *, data: Dataset = None)[source]¶
predict the treatment effect or leaf id, which is determined by the parameter key.
- Parameters
X (np.ndarray) – features
key (str) – the key can be ‘effect’, ‘leaf_id’ or ‘effect-ND’.
data (Dataset, optional) – data. Defaults to None.
- Returns
_description_
- Return type
_type_
Submodules¶
openasce.inference.tree.bin module¶
- class openasce.inference.tree.bin.BinMapper(conf: ConfigTree)[source]¶
Bases:
KBinsDiscretizerA class for binning numerical features.
- __annotations__ = {}¶
- __doc__ = 'A class for binning numerical features.'¶
- __module__ = 'openasce.inference.tree.bin'¶
- _sklearn_auto_wrap_output_keys = {'transform'}¶
- fit(X, y=None)[source]¶
Fit the bin mapper on the input features.
- Parameters
X – Input features.
y – The target variable (not used).
- Returns
The fitted bin mapper object.
- fit_dataset(data)[source]¶
Fit the bin mapper on the dataset.
- Parameters
data – Dataset object containing the input features.
- fit_transform(X, y=None, **fit_params)[source]¶
Fit the bin mapper on the input features and transform them.
- Parameters
X – Input features.
y – The target variable (not used).
fit_params – Additional parameters for fitting.
- Returns
The transformed features.
- inverse_transform(Xt, index: int = None)[source]¶
Inverse transform the transformed features to the original values.
- Parameters
Xt – Transformed features.
index – Index of the feature to inverse transform.
- Returns
The inverse transformed features.
- property is_fit¶
Check if the bin mapper is fit.
- Returns
True if the bin mapper is fit, False otherwise.
- transform(X)[source]¶
Transform the input features using the bin mapper.
- Parameters
X – Input features.
- Returns
The transformed features.
- property upper_bounds¶
Get the upper bounds of the bins.
- Returns
The upper bounds of the bins.
openasce.inference.tree.bin_test module¶
openasce.inference.tree.boosting module¶
- class openasce.inference.tree.boosting.Boosting(tree_cls, conf: ConfigTree, bin_mapper: BinMapper = None)[source]¶
Bases:
object- __annotations__ = {}¶
- __dict__ = mappingproxy({'__module__': 'openasce.inference.tree.boosting', '__init__': <function Boosting.__init__>, 'fit': <function Boosting.fit>, 'preprocess': <function Boosting.preprocess>, 'check_data': <function Boosting.check_data>, 'tr_val': <function Boosting.tr_val>, 'postprocess': <function Boosting.postprocess>, '_validation': <function Boosting._validation>, '_update_paramers': <function Boosting._update_paramers>, 'early_stopping': <function Boosting.early_stopping>, 'predict': <function Boosting.predict>, 'effect': <function Boosting.effect>, 'split_counts': <function Boosting.split_counts>, '__dict__': <attribute '__dict__' of 'Boosting' objects>, '__weakref__': <attribute '__weakref__' of 'Boosting' objects>, '__doc__': None, '__annotations__': {}})¶
- __doc__ = None¶
- __module__ = 'openasce.inference.tree.boosting'¶
- __weakref__¶
list of weak references to the object (if defined)
- early_stopping() bool[source]¶
Check if early stopping criteria is met based on the validation losses.
- Returns
True if early stopping criteria is met, False otherwise.
- fit(data: Dataset)[source]¶
Fit the causal forest model on the provided dataset.
- Parameters
data – Dataset object containing the input features, targets, and treatment.
- predict(X, key: str, *, data: Dataset = None)[source]¶
Predict the output using the trained model on the input data.
- Parameters
X – Feature matrix of the input data.
key – Type of prediction, can be ‘leaf_id’, ‘effect’, or ‘effect-ND’.
data – Dataset object containing the feature data.
- Returns
Prediction result based on the specified key.
- Raises
RuntimeError – If the specified key is unknown and not supported.
- preprocess(data: Dataset)[source]¶
Perform preprocessing steps on the provided dataset.
- Parameters
data – Dataset object containing the input features, targets, and treatment.
- split_counts(trees=None, feature_names=None)[source]¶
Count the number of splits made on each feature in the gradient boost causal trees.
- Parameters
trees – List of decision trees. If None, uses the trained trees.
feature_names – List of feature names. If None, uses the feature columns from conf.
- Returns
Dictionary with feature names as keys and the corresponding split counts as values.
- tr_val(data: Dataset, subsample=None, **kwargs)[source]¶
Split the dataset into training and validation sets.
- Parameters
data – Dataset object containing the input features, targets, and treatment.
subsample – Ratio of instances to include in the training set. If None, uses the instance_ratio from conf.
- Returns
Tuple containing the training dataset, validation dataset, indices of training instances, and indices of validation instances.
openasce.inference.tree.cppnode module¶
- openasce.inference.tree.cppnode.create_didnode_from_dict(info)[source]¶
Create a CppDebiasNode from a dictionary.
- Parameters
info (Dict) – The node information.
- Returns
The CppDebiasNode instance.
- Return type
CppDebiasNode
- openasce.inference.tree.cppnode.predict(nodes: List[openasce.inference.tree.gbct_utils.common.CppDiDNode], x, out, key, threads=20)[source]¶
Predict using the tree nodes.
- Parameters
nodes (List) – The list of tree nodes.
x (ndarray) – The input data.
out (ndarray) – The output array.
key (ndarray) – The prediction key.
threads (int) – The number of threads.
- Returns
The predicted values.
- Return type
ndarray
- Raises
RuntimeError – If the number of nodes is less than or equal to 0.
ValueError – If the node type is not supported.
openasce.inference.tree.cppnode_test module¶
openasce.inference.tree.csv_dataset module¶
- class openasce.inference.tree.csv_dataset.CsvDataset(conf=None, **kwargs)[source]¶
Bases:
DatasetA Dataset interface for loading csv data
- __doc__ = 'A Dataset interface for loading csv data'¶
- __module__ = 'openasce.inference.tree.csv_dataset'¶
- property features¶
- sub_dataset(index=None, cols=None, cols_y=[]) Dataset[source]¶
Abstract interface of sub-sampling
- Parameters
index (_type_, optional) – _description_. Defaults to None.
- Raises
NotImplementedError – _description_
- property targets¶
- property treatment¶
- property weight¶
openasce.inference.tree.dataset module¶
- class openasce.inference.tree.dataset.Dataset[source]¶
Bases:
objectAbstract interface of class dataset
- __annotations__ = {}¶
- __dict__ = mappingproxy({'__module__': 'openasce.inference.tree.dataset', '__doc__': 'Abstract interface of class dataset', '__init__': <function Dataset.__init__>, '__len__': <function Dataset.__len__>, 'new_instance': <staticmethod(<function Dataset.new_instance>)>, 'read': <function Dataset.read>, 'sub_dataset': <function Dataset.sub_dataset>, 'description': <function Dataset.description>, 'targets': <property object>, 'features': <property object>, 'treatment': <property object>, 'feature_columns': <property object>, '__dict__': <attribute '__dict__' of 'Dataset' objects>, '__weakref__': <attribute '__weakref__' of 'Dataset' objects>, '__annotations__': {}})¶
- __doc__ = 'Abstract interface of class dataset'¶
- __module__ = 'openasce.inference.tree.dataset'¶
- __weakref__¶
list of weak references to the object (if defined)
- description(detail: bool = False) None[source]¶
description the dataset
- Parameters
detail (bool, optional) – [description]. Defaults to False.
- property feature_columns¶
- property features¶
- sub_dataset(index=None)[source]¶
Abstract interface of sub-sampling
- Parameters
index (_type_, optional) – _description_. Defaults to None.
- Raises
NotImplementedError – _description_
- property targets¶
- property treatment¶
- class openasce.inference.tree.dataset.PsudoDataset(features: ndarray = None, outcome: ndarray = None, treatment: ndarray = None, conf=None)[source]¶
Bases:
DatasetA Psudo Dataset to wrap for the numpy formatting data.
- Parameters
features (np.ndarray, optional) – features. Defaults to None.
outcome (np.ndarray, optional) – outcome. Defaults to None.
treatment (np.ndarray, optional) – treatment. Defaults to None.
conf (_type_, optional) – configure. Defaults to None.
- __annotations__ = {}¶
- __doc__ = '\n A Psudo Dataset to wrap for the numpy formatting data.\n\n Arguments:\n features (np.ndarray, optional): features. Defaults to None.\n outcome (np.ndarray, optional): outcome. Defaults to None.\n treatment (np.ndarray, optional): treatment. Defaults to None.\n conf (_type_, optional): configure. Defaults to None.\n '¶
- __init__(features: ndarray = None, outcome: ndarray = None, treatment: ndarray = None, conf=None)[source]¶
- __module__ = 'openasce.inference.tree.dataset'¶
- property feature_columns¶
- property features¶
- sub_dataset(index=None, cols=None) Dataset[source]¶
Create a sub-dataset.
- Parameters
index – Indices of the samples to include in the sub-dataset.
cols – Columns to include in the sub-dataset.
- Returns
The sub-dataset.
- property targets¶
- property treatment¶
- property weight¶
openasce.inference.tree.didtree module¶
- class openasce.inference.tree.didtree.DifferenceInDifferencesRegressionTree(n_estimators=100, learning_rate=0.1, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, parallel_l2=0, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: ConfigTree = None, bin_mapper: BinMapper = None)[source]¶
Bases:
BoostingGradient Boosting debiased Causal Tree for regression, Difference-in-differences meets tree-based methods: heterogeneous treatment effects estimation with unmeasured confounding
Reference: 1. Tang, C., Wang, H., Li, X., Qing, C., Li, L., & Zhou, J. (2023). Difference-in-Differences Meets Tree-Based Methods: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Proceedings of the 40th International Conference on Machine Learning.
- Parameters
n_estimators – int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
learning_rate – float, default=0.1 learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.
subsample – float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.
subfeature – float, default=1.0 The fraction of feature to be used for fitting the individual base learners. Referring to random forest.
train_loss – string, default=openasce.inference.tree.losses.MeanSquaredError. The spliting criterion used by GBCT. User can reimplement by inheriting base class Loss.
min_samples_split – int, default=10 The minimum number of samples required to split an internal node
max_depth – int, default=3 maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.
lambd – float, default=5 The regularization parameter of l2_penalty reference to XGB.
coef – float, default=1 The regularization parameter of selection bias, which refer to GBCT.
parallel_l2 – float, default=0
n_period (int, optional) – The number of timesteps. It’s required to be provied by user.
treat_dt (int, optional) – The time step that treatment is assigned, which is less than n_period. It’s required to be provied by user.
init – The initial prediction. Default to 0.
random_state – int or RandomState, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details).
verbose – bool, default=False. Enable verbose output.
n_iter_no_change – int, default=40
n_iter_no_changeis used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping.tol – float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol for
n_iter_no_changeiterations (if set to a number), the training stops.nthreads – The number of threads to use for parallelization. Default is 32.
conf (ConfigTree, optional) – _description_. Defaults to None.
bin_mapper (BinMapper, optional) – _description_. Defaults to None.
- __annotations__ = {}¶
- __doc__ = "\n Gradient Boosting debiased Causal Tree for regression,\n Difference-in-differences meets tree-based methods: heterogeneous treatment effects estimation with unmeasured confounding\n\n Reference:\n 1. Tang, C., Wang, H., Li, X., Qing, C., Li, L., & Zhou, J. (2023). Difference-in-Differences Meets Tree-Based Methods: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Proceedings of the 40th International Conference on Machine Learning.\n\n\n Arguments:\n n_estimators : int, default=100\n The number of boosting stages to perform. Gradient boosting\n is fairly robust to over-fitting so a large number usually\n results in better performance.\n\n learning_rate : float, default=0.1\n learning rate shrinks the contribution of each tree by `learning_rate`.\n There is a trade-off between learning_rate and n_estimators.\n\n subsample : float, default=1.0\n The fraction of samples to be used for fitting the individual base\n learners. If smaller than 1.0 this results in Stochastic Gradient\n Boosting. `subsample` interacts with the parameter `n_estimators`.\n Choosing `subsample < 1.0` leads to a reduction of variance\n and an increase in bias.\n\n subfeature : float, default=1.0\n The fraction of feature to be used for fitting the individual base\n learners. Referring to random forest.\n\n train_loss : string, default=openasce.inference.tree.losses.MeanSquaredError.\n The spliting criterion used by GBCT. User can reimplement by inheriting base class `Loss`.\n\n min_samples_split : int, default=10\n The minimum number of samples required to split an internal node\n\n max_depth : int, default=3\n maximum depth of the individual regression estimators. The maximum\n depth limits the number of nodes in the tree. Tune this parameter\n for best performance; the best value depends on the interaction\n of the input variables.\n\n lambd : float, default=5\n The regularization parameter of l2_penalty reference to XGB.\n\n coef : float, default=1\n The regularization parameter of selection bias, which refer to GBCT.\n\n parallel_l2: float, default=0\n\n n_period (int, optional): The number of timesteps. It's required to be provied by user.\n\n treat_dt (int, optional): The time step that treatment is assigned, which is less than n_period. It's\n required to be provied by user.\n\n init : The initial prediction. Default to 0.\n\n random_state : int or RandomState, default=None\n Controls the random seed given to each Tree estimator at each\n boosting iteration.\n In addition, it controls the random permutation of the features at\n each split (see Notes for more details).\n\n verbose : bool, default=False. Enable verbose output.\n\n n_iter_no_change : int, default=40\n ``n_iter_no_change`` is used to decide if early stopping will be used\n to terminate training when validation score is not improving. By\n default it is set to None to disable early stopping.\n\n tol : float, default=1e-4\n Tolerance for the early stopping. When the loss is not improving\n by at least tol for ``n_iter_no_change`` iterations (if set to a\n number), the training stops.\n\n nthreads: The number of threads to use for parallelization. Default is 32.\n\n conf (ConfigTree, optional): _description_. Defaults to None.\n\n bin_mapper (BinMapper, optional): _description_. Defaults to None.\n "¶
- __init__(n_estimators=100, learning_rate=0.1, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, parallel_l2=0, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: ConfigTree = None, bin_mapper: BinMapper = None)[source]¶
- __module__ = 'openasce.inference.tree.didtree'¶
- effect(X: ndarray = None, *, data: Dataset = None)[source]¶
predict the treatment effect.
- Parameters
X – np.ndarray, [n_instances, n_features]. features
data – Dataset, optional. Defaults to None.
- Returns
np.ndarray, [n_instances,]. treatment effect.
- fit(X: ndarray = None, Y: ndarray = None, D: ndarray = None, *, data: Dataset = None, data_test: Dataset = None)[source]¶
train the GradientBoostingUpliftTree
- Parameters
X – np.ndarray, [n_instances, n_features]. features
Y – np.ndarray, [n_instances, n_period]. outcome
D – np.ndarray, [n_instances,]. treatment.
data (Dataset, optional) – _description_. Defaults to None.
Returns:
- predict(X: ndarray = None, key: str = 'effect', *, data: Dataset = None)[source]¶
predict the treatment effect or leaf id, which is determined by the parameter key.
- Parameters
X (np.ndarray) – features
key (str) – the key can be ‘effect’, ‘leaf_id’ or ‘outcome’.
data (Dataset, optional) – dataset. Defaults to None.
- Returns
_description_
- Return type
_type_
openasce.inference.tree.gbct module¶
- class openasce.inference.tree.gbct.GradientBoostingCausalRegressionTree(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]¶
Bases:
BoostingGradient Boosting debiased Causal Tree for regression, This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.
Reference: 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information Processing Systems 36, 16.
- Parameters
learning_rate – float, default=0.1 learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.
n_estimators – int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
subsample – float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.
subfeature – float, default=1.0 The fraction of feature to be used for fitting the individual base learners. Referring to random forest.
train_loss – string, default=openasce.inference.tree.losses.MeanSquaredError. The spliting criterion used by GBCT. User can reimplement by inheriting base class Loss.
min_samples_split – int, default=10 The minimum number of samples required to split an internal node
max_depth – int, default=3 maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.
lambd – float, default=5 The regularization parameter of l2_penalty reference to XGB.
coef – float, default=1 The regularization parameter of selection bias, which refer to GBCT.
n_period (int, optional) – The number of timesteps. It’s required to be provied by user.
treat_dt (int, optional) – The time step that treatment is assigned, which is less than n_period. It’s required to be provied by user.
init – The initial prediction. Default to 0.
random_state – int or RandomState, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details).
verbose – bool, default=False. Enable verbose output.
n_iter_no_change – int, default=10
n_iter_no_changeis used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping.tol – float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol for
n_iter_no_changeiterations (if set to a number), the training stops.nthreads – The number of threads to use for parallelization. Default is 32.
conf (ConfigTree or dict, optional) – _description_. Defaults to None.
bin_mapper (BinMapper, optional) – _description_. Defaults to None.
- __annotations__ = {}¶
- __doc__ = "\n Gradient Boosting debiased Causal Tree for regression,\n This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.\n\n Reference:\n 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree:\n Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information\n Processing Systems 36, 16.\n\n Arguments:\n learning_rate : float, default=0.1\n learning rate shrinks the contribution of each tree by `learning_rate`.\n There is a trade-off between learning_rate and n_estimators.\n\n\n n_estimators : int, default=100\n The number of boosting stages to perform. Gradient boosting\n is fairly robust to over-fitting so a large number usually\n results in better performance.\n\n subsample : float, default=1.0\n The fraction of samples to be used for fitting the individual base\n learners. If smaller than 1.0 this results in Stochastic Gradient\n Boosting. `subsample` interacts with the parameter `n_estimators`.\n Choosing `subsample < 1.0` leads to a reduction of variance\n and an increase in bias.\n\n subfeature : float, default=1.0\n The fraction of feature to be used for fitting the individual base\n learners. Referring to random forest.\n\n train_loss : string, default=openasce.inference.tree.losses.MeanSquaredError.\n The spliting criterion used by GBCT. User can reimplement by inheriting base class `Loss`.\n\n min_samples_split : int, default=10\n The minimum number of samples required to split an internal node\n\n max_depth : int, default=3\n maximum depth of the individual regression estimators. The maximum\n depth limits the number of nodes in the tree. Tune this parameter\n for best performance; the best value depends on the interaction\n of the input variables.\n\n lambd : float, default=5\n The regularization parameter of l2_penalty reference to XGB.\n\n coef : float, default=1\n The regularization parameter of selection bias, which refer to GBCT.\n\n n_period (int, optional): The number of timesteps. It's required to be provied by user.\n\n treat_dt (int, optional): The time step that treatment is assigned, which is less than n_period. It's\n required to be provied by user.\n\n init : The initial prediction. Default to 0.\n\n random_state : int or RandomState, default=None\n Controls the random seed given to each Tree estimator at each\n boosting iteration.\n In addition, it controls the random permutation of the features at\n each split (see Notes for more details).\n\n verbose : bool, default=False. Enable verbose output.\n\n n_iter_no_change : int, default=10\n ``n_iter_no_change`` is used to decide if early stopping will be used\n to terminate training when validation score is not improving. By\n default it is set to None to disable early stopping.\n\n tol : float, default=1e-4\n Tolerance for the early stopping. When the loss is not improving\n by at least tol for ``n_iter_no_change`` iterations (if set to a\n number), the training stops.\n\n nthreads: The number of threads to use for parallelization. Default is 32.\n\n conf (ConfigTree or dict, optional): _description_. Defaults to None.\n\n bin_mapper (BinMapper, optional): _description_. Defaults to None.\n "¶
- __init__(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.MeanSquaredError', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]¶
- __module__ = 'openasce.inference.tree.gbct'¶
- effect(X: ndarray, *, data: Dataset = None)[source]¶
predict the treatment effect.
- Parameters
X – np.ndarray, [n_instances, n_features]. features
data – Dataset, optional. Defaults to None.
- Returns
np.ndarray, [n_instances,]. treatment effect.
- fit(X: ndarray, Y: ndarray, D: ndarray, *, data: Dataset = None)[source]¶
train the GradientBoostingUpliftTree
- Parameters
X – np.ndarray, [n_instances, n_features]. features
Y – np.ndarray, [n_instances, n_period]. outcome
D – np.ndarray, [n_instances,]. treatment.
data (Dataset, optional) – _description_. Defaults to None.
Returns:
- predict(X: ndarray, key: str, *, data: Dataset = None)[source]¶
predict the treatment effect or leaf id, which is determined by the parameter key.
- Parameters
X (np.ndarray) – features
key (str) – the key can be ‘effect’, ‘leaf_id’ or ‘effect-ND’.
data (Dataset, optional) – dataset. Defaults to None.
- Returns
_description_
- Return type
_type_
- class openasce.inference.tree.gbct.GradientBoostingUpliftTree(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.BinaryCrossEntropy', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]¶
Bases:
BoostingGradient Boosting debiased Causal Tree for classification, This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.
Reference: 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information Processing Systems 36, 16.
- Parameters
learning_rate – float, default=0.1 learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.
n_estimators – int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
subsample – float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.
subfeature – float, default=1.0 The fraction of feature to be used for fitting the individual base learners. Referring to random forest.
train_loss – string, default=openasce.inference.tree.losses.BinaryCrossEntropy. The spliting criterion used by GBCT. User can reimplement by inheriting base class Loss.
min_samples_split – int, default=10 The minimum number of samples required to split an internal node
max_depth – int, default=3 maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.
lambd – float, default=5 The regularization parameter of l2_penalty reference to XGB.
coef – float, default=1 The regularization parameter of selection bias, which refer to GBCT.
n_period (int, optional) – The number of timesteps. It’s required to be provied by user.
treat_dt (int, optional) – The time step that treatment is assigned, which is less than n_period. It’s required to be provied by user.
init – The initial prediction. Default to 0.
random_state – int or RandomState, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details).
verbose – bool, default=False. Enable verbose output.
n_iter_no_change – int, default=10
n_iter_no_changeis used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping.tol – float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol for
n_iter_no_changeiterations (if set to a number), the training stops.nthreads – The number of threads to use for parallelization. Default is 32.
conf (ConfigTree or dict, optional) – _description_. Defaults to None.
bin_mapper (BinMapper, optional) – _description_. Defaults to None.
- __annotations__ = {}¶
- __doc__ = "\n Gradient Boosting debiased Causal Tree for classification,\n This class implement GBCT[1] which estimates heterogeneous treatment effects in the presence of unmeasured confounding using observational data and historical controls.\n\n Reference:\n 1. Tang, C., Wang, H., Li, X., Cui, Q., Zhang, Y.-L., Zhu, F., Li, L., & Zhou, J. (2022). Debiased Causal Tree:\n Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information\n Processing Systems 36, 16.\n\n Arguments:\n learning_rate : float, default=0.1\n learning rate shrinks the contribution of each tree by `learning_rate`.\n There is a trade-off between learning_rate and n_estimators.\n\n\n n_estimators : int, default=100\n The number of boosting stages to perform. Gradient boosting\n is fairly robust to over-fitting so a large number usually\n results in better performance.\n\n subsample : float, default=1.0\n The fraction of samples to be used for fitting the individual base\n learners. If smaller than 1.0 this results in Stochastic Gradient\n Boosting. `subsample` interacts with the parameter `n_estimators`.\n Choosing `subsample < 1.0` leads to a reduction of variance\n and an increase in bias.\n\n subfeature : float, default=1.0\n The fraction of feature to be used for fitting the individual base\n learners. Referring to random forest.\n\n train_loss : string, default=openasce.inference.tree.losses.BinaryCrossEntropy.\n The spliting criterion used by GBCT. User can reimplement by inheriting base class `Loss`.\n\n min_samples_split : int, default=10\n The minimum number of samples required to split an internal node\n\n max_depth : int, default=3\n maximum depth of the individual regression estimators. The maximum\n depth limits the number of nodes in the tree. Tune this parameter\n for best performance; the best value depends on the interaction\n of the input variables.\n\n lambd : float, default=5\n The regularization parameter of l2_penalty reference to XGB.\n\n coef : float, default=1\n The regularization parameter of selection bias, which refer to GBCT.\n\n n_period (int, optional): The number of timesteps. It's required to be provied by user.\n\n treat_dt (int, optional): The time step that treatment is assigned, which is less than n_period. It's\n required to be provied by user.\n\n init : The initial prediction. Default to 0.\n\n random_state : int or RandomState, default=None\n Controls the random seed given to each Tree estimator at each\n boosting iteration.\n In addition, it controls the random permutation of the features at\n each split (see Notes for more details).\n\n verbose : bool, default=False. Enable verbose output.\n\n n_iter_no_change : int, default=10\n ``n_iter_no_change`` is used to decide if early stopping will be used\n to terminate training when validation score is not improving. By\n default it is set to None to disable early stopping.\n\n tol : float, default=1e-4\n Tolerance for the early stopping. When the loss is not improving\n by at least tol for ``n_iter_no_change`` iterations (if set to a\n number), the training stops.\n\n nthreads: The number of threads to use for parallelization. Default is 32.\n\n conf (ConfigTree or dict, optional): _description_. Defaults to None.\n\n bin_mapper (BinMapper, optional): _description_. Defaults to None.\n "¶
- __init__(learning_rate=0.1, n_estimators=100, subsample=1.0, subfeaure=1.0, train_loss='openasce.inference.tree.losses.BinaryCrossEntropy', min_samples_split=10, max_depth=3, lambd=5, coeff=1, n_period=None, treat_dt=None, init=0, random_state=None, verbose=False, n_iter_no_change=10, tol=0.0001, nthreads=32, *, conf: Union[ConfigTree, dict] = None, bin_mapper: BinMapper = None)[source]¶
- __module__ = 'openasce.inference.tree.gbct'¶
- effect(X: ndarray, *, data: Dataset = None)[source]¶
predict the treatment effect.
- Parameters
X – np.ndarray, [n_instances, n_features]. features
data – Dataset, optional. Defaults to None.
- Returns
np.ndarray, [n_instances,]. treatment effect.
- fit(X: ndarray, Y: ndarray, D: ndarray, *, data: Dataset = None)[source]¶
train the GradientBoostingUpliftTree
- Parameters
X – np.ndarray, [n_instances, n_features]. features
Y – np.ndarray, [n_instances, n_period]. outcome
D – np.ndarray, [n_instances]. treatment.
data (Dataset, optional) – _description_. Defaults to None.
Returns:
- predict(X: ndarray, key: str, *, data: Dataset = None)[source]¶
predict the treatment effect or leaf id, which is determined by the parameter key.
- Parameters
X (np.ndarray) – features
key (str) – the key can be ‘effect’, ‘leaf_id’ or ‘effect-ND’.
data (Dataset, optional) – data. Defaults to None.
- Returns
_description_
- Return type
_type_
openasce.inference.tree.gradient_causal_tree module¶
- class openasce.inference.tree.gradient_causal_tree.GradientDebiasedCausalTree(conf: ConfigTree = None, bin_mapper: BinMapper = None, **kwargs)[source]¶
Bases:
objectGradientDebiasedCausalTree is a class that represents a gradient-based debiased causal tree model.
- Parameters
conf (ConfigTree) – The configuration tree.
bin_mapper (BinMapper) – The BinMapper instance.
kwargs – Additional keyword arguments.
- __dict__ = mappingproxy({'__module__': 'openasce.inference.tree.gradient_causal_tree', '__doc__': '\n GradientDebiasedCausalTree is a class that represents a gradient-based debiased causal tree model.\n\n Arguments:\n conf (ConfigTree): The configuration tree.\n bin_mapper (BinMapper): The BinMapper instance.\n kwargs: Additional keyword arguments.\n\n ', '__init__': <function GradientDebiasedCausalTree.__init__>, 'fit': <function GradientDebiasedCausalTree.fit>, 'updater': <function GradientDebiasedCausalTree.updater>, 'split': <function GradientDebiasedCausalTree.split>, '_split_cpp': <function GradientDebiasedCausalTree._split_cpp>, 'preprocess': <function GradientDebiasedCausalTree.preprocess>, 'export': <function GradientDebiasedCausalTree.export>, 'postprocess': <function GradientDebiasedCausalTree.postprocess>, '_predict': <function GradientDebiasedCausalTree._predict>, 'predict': <function GradientDebiasedCausalTree.predict>, 'gradients': <function GradientDebiasedCausalTree.gradients>, 'loss': <function GradientDebiasedCausalTree.loss>, '__dict__': <attribute '__dict__' of 'GradientDebiasedCausalTree' objects>, '__weakref__': <attribute '__weakref__' of 'GradientDebiasedCausalTree' objects>, '__annotations__': {}})¶
- __doc__ = '\n GradientDebiasedCausalTree is a class that represents a gradient-based debiased causal tree model.\n\n Arguments:\n conf (ConfigTree): The configuration tree.\n bin_mapper (BinMapper): The BinMapper instance.\n kwargs: Additional keyword arguments.\n\n '¶
- __module__ = 'openasce.inference.tree.gradient_causal_tree'¶
- __weakref__¶
list of weak references to the object (if defined)
- _predict(nodes, x, key='effect', out=None)[source]¶
Internal method to predict using the tree nodes.
- Parameters
nodes – The tree nodes.
x – The input data.
key – The prediction key.
out – The output array.
- Returns
The predicted values.
- Return type
ndarray
- Raises
NotImplementedError – If the prediction key is not implemented.
- _split_cpp(leaves: List[GradientCausalTreeNode], hist: Histogram)[source]¶
Split the tree nodes using C++ implementation.
- Parameters
leaves (List[GradientCausalTreeNode]) – The list of tree nodes.
hist (Histogram) – The histogram.
- Returns
The split conditions.
- Return type
Dict
- export()[source]¶
Export the tree model.
- Returns
The exported C++ nodes and Python nodes.
- Return type
Tuple[List[DidNode], List[Dict]]
- fit(gradients, cgradients, data: Dataset, eta=None)[source]¶
Fit the GradientDebiasedCausalTree model.
- Parameters
gradients – The gradients.
cgradients – The counterfactal gradients.
data (Dataset) – The training dataset.
eta – The eta values.
- Returns
None
- gradients(target, prediction, **kwargs)[source]¶
Compute the gradients of the loss function.
- Parameters
target – The target values.
prediction – The predicted values.
kwargs – Additional keyword arguments.
- Returns
The gradients.
- Return type
ndarray
- loss(grad, hess, y_hat=None, **kwargs)[source]¶
Compute the loss function.
- Parameters
grad – The gradients.
hess – The hessians.
y_hat – The predicted values.
kwargs – Additional keyword arguments.
- Returns
The loss values.
- Return type
ndarray
- predict(x, w=None, key='effect', out=None)[source]¶
Predict the treatment effect or other values.
- Parameters
x – The input data.
w – The treatment weights.
key – The prediction key.
out – The output array.
- Returns
The predicted values.
- Return type
ndarray
- preprocess(gradients, cgradients, tr_data: Dataset, eta=None, subsample=1, subfeature=1)[source]¶
Preprocess the data before fitting the tree.
- split(leaves: List[GradientCausalTreeNode], hist: Histogram)[source]¶
Split the tree nodes.
- Parameters
leaves (List[GradientCausalTreeNode]) – The list of tree nodes.
hist (Histogram) – The histogram.
- Returns
The split conditions.
- Return type
Dict
- updater(split_conds: Dict, gradients, cgradients, tr_data, hist: Histogram, idx_map, leaves: List[GradientCausalTreeNode], leaves_range, eta=None)[source]¶
Update the tree nodes.
- Parameters
split_conds (Dict) – The split conditions.
gradients – The gradients.
cgradients – The cgradients.
tr_data (Dataset) – The training dataset.
hist (Histogram) – The histogram.
idx_map – The index map.
leaves (List[GradientCausalTreeNode]) – The list of tree nodes.
leaves_range – The range of each leaf.
eta – The eta values.
- Returns
The updated tree nodes and updated leaf ranges.
- Return type
Tuple[List[GradientCausalTreeNode], ndarray]
- class openasce.inference.tree.gradient_causal_tree.GradientDiDCausalTree(conf: ConfigTree = None, bin_mapper: BinMapper = None, **kwargs)[source]¶
Bases:
GradientDebiasedCausalTreeGradientDiDCausalTree is a class that represents a gradient-based debiased causal tree model with difference in differences. It inherits from the GradientDebiasedCausalTree class.
- Parameters
conf (ConfigTree) – The configuration tree.
bin_mapper (BinMapper) – The BinMapper instance.
kwargs – Additional keyword arguments.
- __annotations__ = {}¶
- __doc__ = '\n GradientDiDCausalTree is a class that represents a gradient-based debiased causal tree model with difference in\n differences. It inherits from the GradientDebiasedCausalTree class.\n\n Arguments:\n conf (ConfigTree): The configuration tree.\n bin_mapper (BinMapper): The BinMapper instance.\n kwargs: Additional keyword arguments.\n '¶
- __module__ = 'openasce.inference.tree.gradient_causal_tree'¶
- _predict(nodes, x, key='effect', out=None)[source]¶
Predicting for the given data using the exported nodes.
- Parameters
nodes – The exported nodes.
x – The input data.
key – The key specifying the prediction type (default: ‘effect’).
out – The output array (default: None).
- Returns
The predictions.
- _split_cpp(leaves: List[GradientCausalTreeNode], hist: Histogram)[source]¶
Split the leaf nodes at the current level using C++ implementation.
- Parameters
leaves – The list of leaf nodes.
hist – The histogram object.
- Returns
The split conditions.
- Return type
split_conds
- export()[source]¶
Export the GradientDiDCausalTree.
- Returns
The exported nodes in C++ object. slim_nodes: The exported nodes in python object.
- Return type
slim_cppnodes
- predict(x, w=None, key='effect', out=None)[source]¶
Predict the treatment effect or other outcomes for given data.
- Parameters
x – The input data.
w – The treatment assignments (default: None).
key – The key specifying the prediction type (default: ‘effect’).
out – The output array (default: None).
- Returns
The predicted values. cpred: The counterfactual predicted values. eta: The optimal parallel interval between the treated and control group.
- Return type
pred
- preprocess(gradients, cgradients, tr_data: Dataset, eta=None, subsample=1, subfeature=1)[source]¶
Preprocesses the data for the GradientDiDCausalTree model.
- Parameters
gradients – The gradients.
cgradients – The counterfactual gradients.
tr_data (Dataset) – The training dataset.
eta – The parallel interval between the treated and control group.
subsample (float) – The subsampling ratio for instances (default: 1).
subfeature (float) – The subsampling ratio for features (default: 1).
- Returns
The constructed histogram. index (ndarray): The permutation index.
- Return type
hist (Histogram)
- split(leaves: List[GradientCausalTreeNode], hist: Histogram)[source]¶
Split the leaf nodes.
- Parameters
leaves – The list of leaf nodes.
hist – The histogram object.
- Returns
The split conditions.
- Return type
split_conds
- updater(split_conds: Dict, gradients, cgradients, tr_data, hist: Histogram, idx_map, leaves: List[GradientCausalTreeNode], leaves_range, eta)[source]¶
Update the GradientCausalTree by performing splitting and updating histograms.
- Parameters
split_conds (Dict) – The split conditions.
gradients – The gradients.
cgradients – The counterfactual gradients.
tr_data – The training dataset.
hist (Histogram) – The histogram object.
idx_map – The index mapping.
leaves (List[GradientCausalTreeNode]) – The list of leaves.
leaves_range – The range of leaves.
eta – The parallel interval between the treated and control group.
- Returns
The new leaves. leaves_range_new: The new range of leaves.
- Return type
leaves_new (List[GradientCausalTreeNode])
openasce.inference.tree.histogram module¶
- class openasce.inference.tree.histogram.Histogram(conf: ConfigTree)[source]¶
Bases:
object- __dict__ = mappingproxy({'__module__': 'openasce.inference.tree.histogram', '__init__': <function Histogram.__init__>, 'update_hists': <function Histogram.update_hists>, '__getattr__': <function Histogram.__getattr__>, 'new_instance': <classmethod(<function Histogram.new_instance>)>, '__dict__': <attribute '__dict__' of 'Histogram' objects>, '__weakref__': <attribute '__weakref__' of 'Histogram' objects>, '__doc__': None, '__annotations__': {}})¶
- __doc__ = None¶
- __getattr__(_Histogram__name: str)[source]¶
Get the attribute value.
- Parameters
__name (str) – The name of the attribute.
- Returns
The attribute value.
- Return type
ndarray
- Raises
AttributeError – If the attribute is not found.
- __module__ = 'openasce.inference.tree.histogram'¶
- __weakref__¶
list of weak references to the object (if defined)
- classmethod new_instance(dataset: Dataset, conf: ConfigTree = None, **kwargs)[source]¶
Create a new instance of the histogram.
- update_hists(target, index, leaves_range, treatment, bin_features, is_gradient, is_splitting, threads)[source]¶
Update histograms for all nodes in the same level of a tree
- Parameters
target (_type_) – _description_
index (_type_) – _description_
leaves_range (_type_) – _description_
treatment (_type_) – _description_
bin_features (_type_) – _description_
is_gradient (bool) – _description_
is_splitting (bool) – _description_
threads (_type_) – _description_
- Raises
ValueError – _description_
- Returns
_description_
- Return type
_type_
openasce.inference.tree.histogram_test module¶
openasce.inference.tree.information module¶
- class openasce.inference.tree.information.CausalDataInfo(conf, **kwargs)[source]¶
Bases:
object- __dict__ = mappingproxy({'__module__': 'openasce.inference.tree.information', '__init__': <function CausalDataInfo.__init__>, '__dict__': <attribute '__dict__' of 'CausalDataInfo' objects>, '__weakref__': <attribute '__weakref__' of 'CausalDataInfo' objects>, '__doc__': None, '__annotations__': {}})¶
- __doc__ = None¶
- __module__ = 'openasce.inference.tree.information'¶
- __weakref__¶
list of weak references to the object (if defined)
openasce.inference.tree.losses module¶
- class openasce.inference.tree.losses.BinaryCrossEntropy(**kwargs)[source]¶
Bases:
GradLoss- __abstractmethods__ = frozenset({})¶
- __doc__ = None¶
- __module__ = 'openasce.inference.tree.losses'¶
- _abc_impl = <_abc._abc_data object>¶
- property const_hess¶
Check if the hessian is constant.
- Returns
True if the hessian is constant, False otherwise.
- gradient(target, prediction)[source]¶
Compute the gradient: gradient = prediction - target, where prediction is the positive probability.
- Parameters
target – The target values.
prediction – The predicted probabilities.
- Returns
The computed gradients.
- Return type
ndarray
- gradients(target, logit, treatment)[source]¶
Calculate gradient and hessian
- Parameters
target (DataFrame) – [description]
prediction (DataFrame) – [description]
treatment (DataFrame) – [description]
- Returns
[description]
- Return type
Union[Tuple, None]
- class openasce.inference.tree.losses.GradLoss(**kwargs)[source]¶
Bases:
LossAbstract base class for gradient-based loss functions.
- __abstractmethods__ = frozenset({'gradient', 'gradients', 'hessian', 'loss'})¶
- __annotations__ = {}¶
- __doc__ = 'Abstract base class for gradient-based loss functions.'¶
- __module__ = 'openasce.inference.tree.losses'¶
- _abc_impl = <_abc._abc_data object>¶
- property const_hess¶
Check if the hessian is constant.
- Returns
True if the hessian is constant, False otherwise.
- abstract gradient(target, prediction)[source]¶
Calculate the gradient of the loss.
- Parameters
target – Target values.
prediction – Predicted values.
- Raises
NotImplementedError – If the method is not implemented.
- Returns
The gradient.
- class openasce.inference.tree.losses.Loss(**kwargs)[source]¶
Bases:
objectAbstract base class for loss functions.
- __abstractmethods__ = frozenset({'loss'})¶
- __annotations__ = {}¶
- __dict__ = mappingproxy({'__module__': 'openasce.inference.tree.losses', '__doc__': 'Abstract base class for loss functions.', '__init__': <function Loss.__init__>, 'new_instance': <staticmethod(<function Loss.new_instance>)>, 'loss': <function Loss.loss>, '__dict__': <attribute '__dict__' of 'Loss' objects>, '__weakref__': <attribute '__weakref__' of 'Loss' objects>, '__abstractmethods__': frozenset({'loss'}), '_abc_impl': <_abc._abc_data object>, '__annotations__': {}})¶
- __doc__ = 'Abstract base class for loss functions.'¶
- __module__ = 'openasce.inference.tree.losses'¶
- __weakref__¶
list of weak references to the object (if defined)
- _abc_impl = <_abc._abc_data object>¶
- class openasce.inference.tree.losses.MeanSquaredError(**kwargs)[source]¶
Bases:
GradLoss- __abstractmethods__ = frozenset({})¶
- __annotations__ = {}¶
- __doc__ = None¶
- __module__ = 'openasce.inference.tree.losses'¶
- _abc_impl = <_abc._abc_data object>¶
- property const_hess¶
Check if the hessian is constant.
- Returns
True if the hessian is constant, False otherwise.
- gradient(target, prediction, **kwargs)[source]¶
Calculate the gradient of the loss.
- Parameters
target – Target values.
prediction – Predicted values.
- Raises
NotImplementedError – If the method is not implemented.
- Returns
The gradient.
- gradients(target, prediction, **kwargs) Tuple[source]¶
Calculate the gradients and hessians.
- Parameters
target – Target values.
prediction – Predicted values.
- Raises
NotImplementedError – If the method is not implemented.
- Returns
Tuple containing the gradients and hessians.
openasce.inference.tree.reflect_utils module¶
- openasce.inference.tree.reflect_utils.get_class(module_class)[source]¶
Get the class using import_module
- Parameters
module_class – module class name, full path a.b.class_name
openasce.inference.tree.splitting_losses module¶
- openasce.inference.tree.splitting_losses.causal_tree_splitting_losses(configs, bin_outcome_hist, bin_counts, parameters: dict)[source]¶
Calculate the splitting losses for the ordinary causal tree.
- Parameters
configs – Configuration.
bin_outcome_hist – Histogram of outcome values.
bin_counts – Histogram counts.
parameters – Additional parameters.
- Returns
The splitting losses.
- openasce.inference.tree.splitting_losses.causal_tree_splitting_losses2(configs, bin_grad_hist, bin_hess_hist, bin_counts, parameters: dict)[source]¶
Calculate the splitting losses for the ordinary causal tree.
- Parameters
configs – Configuration.
bin_outcome_hist – Histogram of outcome values.
bin_counts – Histogram counts.
parameters – Additional parameters.
- Returns
The splitting losses.
- openasce.inference.tree.splitting_losses.didtree_splitting_losses(configs, bin_grad_hist, bin_hess_hist, bin_cgrad_hist, bin_chess_hist, bin_eta_hist, bin_counts, parameters: dict)[source]¶
Calculate the splitting losses for the DiD-Tree model.
- Parameters
configs – Configuration.
bin_grad_hist – Histogram of gradients.
bin_hess_hist – Histogram of Hessians.
bin_cgrad_hist – Histogram of cumulative gradients.
bin_chess_hist – Histogram of cumulative Hessians.
bin_eta_hist – Histogram of etas.
bin_counts – Histogram counts.
parameters – Additional parameters.
- Returns
The splitting losses.
- openasce.inference.tree.splitting_losses.gbct_splitting_losses(configs, bin_grad_hist, bin_hess_hist, bin_cgrad_hist, bin_chess_hist, bin_counts, parameters: dict)[source]¶
Calculate the splitting losses for the GBCT model.
- Parameters
configs – Configuration.
bin_grad_hist – Histogram of gradients.
bin_hess_hist – Histogram of Hessians.
bin_cgrad_hist – Histogram of cumulative gradients.
bin_chess_hist – Histogram of cumulative Hessians.
bin_counts – Histogram counts.
parameters – Additional parameters.
- Returns
The splitting losses.
openasce.inference.tree.tree_node module¶
- class openasce.inference.tree.tree_node.CausalTreeNode(conf: ConfigTree = None, **kwargs)[source]¶
Bases:
objectA class for a node in a Causal Tree maximizing heterogenous treatment effect.
- __dict__ = mappingproxy({'__module__': 'openasce.inference.tree.tree_node', '__doc__': 'A class for a node in a Causal Tree maximizing heterogenous treatment effect.', '__init__': <function CausalTreeNode.__init__>, 'estimate': <function CausalTreeNode.estimate>, 'estimate_by_hist': <function CausalTreeNode.estimate_by_hist>, 'children': <property object>, 'effect': <function CausalTreeNode.effect>, '__dict__': <attribute '__dict__' of 'CausalTreeNode' objects>, '__weakref__': <attribute '__weakref__' of 'CausalTreeNode' objects>, '__annotations__': {'leaf_id': 'int', 'level_id': 'int', 'op_loss': 'Loss'}})¶
- __doc__ = 'A class for a node in a Causal Tree maximizing heterogenous treatment effect.'¶
- __module__ = 'openasce.inference.tree.tree_node'¶
- __weakref__¶
list of weak references to the object (if defined)
- property children¶
- effect(w)[source]¶
Compute the treatment effect.
- Parameters
w – The treatment weights.
- Returns
The computed treatment effect.
- Return type
ndarray
- estimate(outcome: ndarray, treatment: ndarray, weight: ndarray = None)[source]¶
Estimate the treatment effect given the outcome, treatment, and weight.
- Parameters
outcome – The outcome values.
treatment – The treatment values.
weight – The weight values.
- Returns
The estimated treatment effect.
- Return type
ndarray
- estimate_by_hist(outcome: ndarray, treatment: ndarray, count: ndarray)[source]¶
Estimate the treatment effect using histogram-based method.
- Parameters
outcome – The outcome values.
treatment – The treatment values.
count – The count values.
- Returns
The estimated treatment effect.
- Return type
ndarray
- class openasce.inference.tree.tree_node.GradientCausalTreeNode(conf: ConfigTree = None, **kwargs)[source]¶
Bases:
CausalTreeNodeA class for a node in a Gradient Boosting Causal Tree.
- __annotations__ = {}¶
- __doc__ = 'A class for a node in a Gradient Boosting Causal Tree.'¶
- __module__ = 'openasce.inference.tree.tree_node'¶
openasce.inference.tree.utils module¶
- openasce.inference.tree.utils.find_bin_parallel(data, max_bin=64, min_data_in_bin=100, min_split_data=100, pre_filter=False, bin_type=0, use_missing=True, zero_as_missing=False, forced_upper_bounds=[])[source]¶
Find bins in parallel for the given data.
- Parameters
data – The input data.
max_bin – The maximum number of bins. (default: 64)
min_data_in_bin – The minimum number of data points in a bin. (default: 100)
min_split_data – The minimum number of data points to split a bin. (default: 100)
pre_filter – Whether to pre-filter the data. (default: False)
bin_type – The type of binning. (default: 0)
use_missing – Whether to use missing values. (default: True)
zero_as_missing – Whether to treat zero as a missing value. (default: False)
forced_upper_bounds – The forced upper bounds for the bins. (default: [])
- Returns
The bins found.
- Raises
ValueError – If the data type is not supported.
- openasce.inference.tree.utils.groupby(data: ndarray, by: ndarray, aggregator: str = 'mean', dropna=True)[source]¶
- openasce.inference.tree.utils.indexbyarray(arr, idx, fact_outcome, counterfact_outcome=None, n_threads=- 1)[source]¶
Index an outcome array (arr with shape [n, 2]) by another binary treament array (idx)
- Parameters
arr (ndarray) – The input array.
idx (ndarray) – The index array.
fact_outcome (ndarray) – The outcome array to update.
counterfact_outcome (ndarray) – The counterfactual outcome array to update.
n_threads (int, optional) – The number of threads to use. Defaults to -1.
- Returns
The updated outcome arrays.
- Return type
ndarray
- openasce.inference.tree.utils.list_to_array(data: list, out=None, st_idx: int = 0, miss_value=0, threads=- 1)[source]¶
- openasce.inference.tree.utils.update_histogram(target, x_binned, index, leaves_range, treatment, out, leaves=[], n_treatment=2, n_bins=64, threads=- 1)[source]¶
Update the histogram of each leaf.
- Parameters
target (ndarray) – Target array. Shape [n, n_outcome].
x_binned (ndarray) – Binned feature array. Shape [n, n_feature].
index (ndarray) – Index array. Shape [n]. The end position in leaves_range must not exceed n.
leaves_range (ndarray) – List of each leaf’s data range. Shape [n_leaf, 2]. Each term looks like [st_pos, end_pos).
treatment (ndarray) – Treatment array. Shape [n].
out (ndarray) – Output histogram. Shape [n_leaf, n_features, n_bins, n_treatment, n_outcome].
leaves (list, optional) – List of leaf indices. Defaults to [].
n_treatment (int, optional) – The number of treatments. Defaults to 2.
n_bins (int, optional) – The number of bins. Defaults to 64.
threads (int, optional) – The number of threads to use. Defaults to -1.
- Returns
The updated histogram array.
- Return type
ndarray
- openasce.inference.tree.utils.update_histograms(targets, x_binned, index, leaves_range, treatment, outs, leaves=[], n_treatment=2, n_bins=64, threads=- 1)[source]¶
Update the histogram of each leaf.
- Parameters
targets (list) – List of target arrays. Shape [n, n_outcome].
x_binned (ndarray) – Binned feature array. Shape [n, n_feature].
index (ndarray) – Index array. Shape [n]. Must satisfy that the end position in leaves_range is not greater than n.
leaves_range (ndarray) – List of each leaf’s data range. Shape [n_leaf, 2]. Each term looks like [st_pos, end_pos).
treatment (ndarray) – Treatment array. Shape [n].
outs (list) – List of output histogram arrays. Shape [n_leaf, n_features, n_bins, n_treatment, n_outcome].
leaves (list, optional) – List of leaf indices. Defaults to [].
n_treatment (int, optional) – The number of treatments. Defaults to 2.
n_bins (int, optional) – The number of bins. Defaults to 64.
threads (int, optional) – The number of threads to use. Defaults to -1.
- Returns
The updated histogram arrays.
- Return type
ndarray
- openasce.inference.tree.utils.update_x_map(x_binned, ins2leaf, split_infos, leaves_range, out, nthread=- 1)[source]¶
Update the index of instances
- Parameters
x_binned (ndarray) – The binned feature array.
ins2leaf (ndarray) – The mapping array.
split_infos (ndarray) – The split information array.
leaves_range (ndarray) – The range of each leaf.
out (ndarray) – The output array to store the updated mapping.
nthread (int, optional) – The number of threads to use. Defaults to -1.
- Returns
None
- openasce.inference.tree.utils.value_bin_parallel(data, bin_mappers: List[openasce.inference.tree.gbct_utils.bin.BinMaper], out=None, threads=- 1)[source]¶
Transform the input data to bin values in parallel.
- Parameters
data – The input data.
bin_mappers – The bin mappers.
out – The output array to store the bin values. (default: None)
threads – The number of threads to use (-1 for maximum). (default: -1)
- Returns
The transformed bin values.
- Raises
ValueError – If the output dtype is not supported.