dyn2sel.apply_dcs package¶

Submodules¶

dyn2sel.apply_dcs.DESDDMethod module¶

class dyn2sel.apply_dcs.DESDDMethod.DESDDMethod(base_ensemble, drift_detector=ADWIN(delta=0.002), ensemble_size=10, min_lambda=1, max_lambda=6)¶

Bases: dyn2sel.apply_dcs.base.DCSApplier

Dynamic Ensemble Selection for Drift Detection (DESDD) method provides a different concept in Dynamic Selection. Methods such as KNORA-E and KNORA-U are also referred to as DES methods because they select a subset of the ensemble, which is, by definition, also an ensemble. DESDD, in contrast, creates a group of ensembles and selects the one with the higher accuracy to be the predictor of the instance.

base_ensemble : Scikit-Multiflow ensemble: The ensemble used for populating the ensemble of ensembles
drift_detector : Scikit-Multiflow Drift Detector, default=ADWIN(): The drift detector used for detecting drift on the ensemble. When a drift is detected the whole ensemble is discarded and its construction starts over again.
ensemble_size : integer, default=10: The number of ensembles used in the ensemble of ensembles.
min_lambda : integer, default=1: The minimum lambda value used for online bagging in the ensemble generation process.
max_lambda : integer, default=6: The maximum lambda value used for online bagging in the ensemble generation process.

Albuquerque, R. A. S., Costa, A. F. J., Santos, E. M. dos, Sabourin, R.,Giusti, R.A. 2019. Decision-Based Dynamic Ensemble Selection Method for Concept Drift.

partial_fit(X, y, classes=None, sample_weight=None)¶

Partially (incrementally) fit the model. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Array with all possible/known class labels. Usage varies depending on the learning method.
sample_weight: numpy.ndarray of shape (n_samples), optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

self Notes ——- Description taken from scikit-multiflow

predict(X, y=None)¶

Predict classes for the passed data. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The set of data samples to predict the class labels for.

y : array-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X. This parameter is only considered with the oracle selector.

A numpy.ndarray with all the predictions for the samples in X. Notes ——- - Description partially taken from scikit-multiflow - This method signature is overwritten because we need to pass the y parameter for the predict method when using

the oracle selector

predict_proba(X)¶

Estimates the probability of each sample in X belonging to each of the class-labels.

X : numpy.ndarray of shape (n_samples, n_features): The matrix of samples one wants to predict the class probabilities for.

A numpy.ndarray of shape (n_samples, n_labels), in which each outer entry is associated with the X entry of the same index. And where the list in index [i] contains len(self.target_values) elements, each of which represents the probability that the i-th sample of X belongs to a certain class-label.

update_detector(X, y)¶

dyn2sel.apply_dcs.DYNSEMethod module¶

class dyn2sel.apply_dcs.DYNSEMethod.DYNSEMethod(clf, chunk_size, dcs_method, max_ensemble_size=-1)¶

Bases: dyn2sel.apply_dcs.base.DCSApplier

The Dynamic Selection Based Drift Handler (DYNSE) is a method that applies traditional offline techniques of Dynamic Selection in online Machine Learning environments, in order to deal with concept drift. It builds each classifier with batches of labeled data that arrive at once. Each batch is used to train a new classifier of the ensemble, and if the size of the batch is not sufficiently large, multiple batches can be accumulated before training. Any base classifier can be used to compose the ensemble, even the ones intended for offline Machine Learning, since they receive the data for training all at once. On the prediction step, when an instance arrives to be predicted, a K-Nearest Neighbors search is executed to find the most similar instances to x in the validation set, which is defined by the M latest supervised batches that arrived to be trained on. Once the similar instances are gathered, any selection method that depends on it can be applied.

clf : Scikit-Multiflow Classifier: The base classifier used for populating the ensemble
chunk_size : integer: The size of the chunks to accumulate data before fitting a classifier.
dcs_method : DCSTechnique object: Dynamic selection technique to be used in the prediction process.
max_ensemble_size : integer, default=-1: The maximum size that an ensemble can grow. If -1, it grows indefinitely.

Almeida, P. R. L. D.; Oliveira, L. S.; Britto, A. D. S.; Sabourin, R. 2016. Handling concept drifts using dynamic selection of classifiers. In:2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI). [S.l.: s.n.]. p. 989–995. ISSN 2375-0197.

partial_fit(X, y, classes=None, sample_weight=None)¶

Partially (incrementally) fit the model. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Array with all possible/known class labels. Usage varies depending on the learning method.
sample_weight: numpy.ndarray of shape (n_samples), optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

self Notes ——- Description taken from scikit-multiflow

predict(X, y=None)¶

Predict classes for the passed data. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The set of data samples to predict the class labels for.

y : array-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X. This parameter is only considered with the oracle selector.

A numpy.ndarray with all the predictions for the samples in X. Notes ——- - Description partially taken from scikit-multiflow - This method signature is overwritten because we need to pass the y parameter for the predict method when using

the oracle selector

predict_proba(X)¶

Estimates the probability of each sample in X belonging to each of the class-labels.

X : numpy.ndarray of shape (n_samples, n_features): The matrix of samples one wants to predict the class probabilities for.

A numpy.ndarray of shape (n_samples, n_labels), in which each outer entry is associated with the X entry of the same index. And where the list in index [i] contains len(self.target_values) elements, each of which represents the probability that the i-th sample of X belongs to a certain class-label.

dyn2sel.apply_dcs.MDEMethod module¶

class dyn2sel.apply_dcs.MDEMethod.MDEMethod(clf, chunk_size, max_ensemble_size=-1, alpha=0.3)¶

Bases: dyn2sel.apply_dcs.base.DCSApplier

The Minority Driven Ensemble (MDE) is not only a selection method but a whole framework that covers training an ensemble with a data stream and predicting new instances. It focuses on data that are affected by the imbalanced class problem. This is done by replacing poor performing classifiers with new ones. The method divides the stream into chunks of data with a fixed size. Each chunk is passed as a mini-batch for the ensemble to train on. For each data chunk, the instances belonging to the minority class are filtered in order to remove outliers. That is done using K-Nearest Neighbors on the current data chunk. If the nearest neighbors of each instance belong to the majority class, the instance is then considered an outlier and then removed from the chunk. When predicting, if a single member predicted the instance as being from the minority class, it is then considered to be from such class.

clf : Scikit-Multiflow Classifier: The base classifier used for populating the ensemble
chunk_size : integer: The size of the chunks to accumulate data before fitting a classifier.
max_ensemble_size : integer, default=-1: The maximum size that an ensemble can grow. If -1, it grows indefinitely.
alpha : float between 0 and 1, default=0.3: Value that composes the threshold for removing classifiers with low Balanced Class Accuracy (BAC), If one classifier is with value less than 0.5 + alpha, it is removed from the ensemble.

Zyblewski, P.; Ksieniewicz, P.; Woźniak, M. Classifier selection for highly imbalanced data streams with minority driven ensemble. In: Artificial Intelligence and Soft Computing. Cham: Springer International Publishing, 2019. p. 626–635. ISBN 978-3-030-20912-4

get_minority_class(y)¶

partial_fit(X, y, classes=None, sample_weight=None)¶

Partially (incrementally) fit the model. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Array with all possible/known class labels. Usage varies depending on the learning method.
sample_weight: numpy.ndarray of shape (n_samples), optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

self Notes ——- Description taken from scikit-multiflow

predict(X, y=None)¶

Predict classes for the passed data. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The set of data samples to predict the class labels for.

y : array-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X. This parameter is only considered with the oracle selector.

A numpy.ndarray with all the predictions for the samples in X. Notes ——- - Description partially taken from scikit-multiflow - This method signature is overwritten because we need to pass the y parameter for the predict method when using

the oracle selector

predict_proba(X)¶

Estimates the probability of each sample in X belonging to each of the class-labels.

X : numpy.ndarray of shape (n_samples, n_features): The matrix of samples one wants to predict the class probabilities for.

A numpy.ndarray of shape (n_samples, n_labels), in which each outer entry is associated with the X entry of the same index. And where the list in index [i] contains len(self.target_values) elements, each of which represents the probability that the i-th sample of X belongs to a certain class-label.

dyn2sel.apply_dcs.base module¶

class dyn2sel.apply_dcs.base.DCSApplier¶

Bases: skmultiflow.core.base.ClassifierMixin

DCSApplier base class for compatibility with scikit-multiflow.

partial_fit(X, y, classes=None, sample_weight=None)¶

Partially (incrementally) fit the model. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Array with all possible/known class labels. Usage varies depending on the learning method.
sample_weight: numpy.ndarray of shape (n_samples), optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

self Notes ——- Description taken from scikit-multiflow

predict(X, y=None)¶

Predict classes for the passed data. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The set of data samples to predict the class labels for.

y : array-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X. This parameter is only considered with the oracle selector.

A numpy.ndarray with all the predictions for the samples in X. Notes ——- - Description partially taken from scikit-multiflow - This method signature is overwritten because we need to pass the y parameter for the predict method when using

the oracle selector

predict_proba(X)¶

Estimates the probability of each sample in X belonging to each of the class-labels.

X : numpy.ndarray of shape (n_samples, n_features): The matrix of samples one wants to predict the class probabilities for.

A numpy.ndarray of shape (n_samples, n_labels), in which each outer entry is associated with the X entry of the same index. And where the list in index [i] contains len(self.target_values) elements, each of which represents the probability that the i-th sample of X belongs to a certain class-label.

score(X, y, sample_weight=None)¶

Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters ———- X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X.
sample_weight : array-like, shape = [n_samples], optional: Sample weights.

score : float: Mean accuracy of self.predict(X) wrt. y.

Description partially taken from scikit-multiflow
This method is overwritten because we need to pass the y parameter for the predict method for the oracle

selector to work

Module contents¶

class dyn2sel.apply_dcs.DCSApplier¶

Bases: skmultiflow.core.base.ClassifierMixin

DCSApplier base class for compatibility with scikit-multiflow.

partial_fit(X, y, classes=None, sample_weight=None)¶

Partially (incrementally) fit the model. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Array with all possible/known class labels. Usage varies depending on the learning method.
sample_weight: numpy.ndarray of shape (n_samples), optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

self Notes ——- Description taken from scikit-multiflow

predict(X, y=None)¶

Predict classes for the passed data. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The set of data samples to predict the class labels for.

y : array-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X. This parameter is only considered with the oracle selector.

A numpy.ndarray with all the predictions for the samples in X. Notes ——- - Description partially taken from scikit-multiflow - This method signature is overwritten because we need to pass the y parameter for the predict method when using

the oracle selector

predict_proba(X)¶

Estimates the probability of each sample in X belonging to each of the class-labels.

X : numpy.ndarray of shape (n_samples, n_features): The matrix of samples one wants to predict the class probabilities for.

A numpy.ndarray of shape (n_samples, n_labels), in which each outer entry is associated with the X entry of the same index. And where the list in index [i] contains len(self.target_values) elements, each of which represents the probability that the i-th sample of X belongs to a certain class-label.

score(X, y, sample_weight=None)¶

Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters ———- X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X.
sample_weight : array-like, shape = [n_samples], optional: Sample weights.

score : float: Mean accuracy of self.predict(X) wrt. y.

Description partially taken from scikit-multiflow
This method is overwritten because we need to pass the y parameter for the predict method for the oracle

selector to work

class dyn2sel.apply_dcs.DYNSEMethod(clf, chunk_size, dcs_method, max_ensemble_size=-1)¶

Bases: dyn2sel.apply_dcs.base.DCSApplier

The Dynamic Selection Based Drift Handler (DYNSE) is a method that applies traditional offline techniques of Dynamic Selection in online Machine Learning environments, in order to deal with concept drift. It builds each classifier with batches of labeled data that arrive at once. Each batch is used to train a new classifier of the ensemble, and if the size of the batch is not sufficiently large, multiple batches can be accumulated before training. Any base classifier can be used to compose the ensemble, even the ones intended for offline Machine Learning, since they receive the data for training all at once. On the prediction step, when an instance arrives to be predicted, a K-Nearest Neighbors search is executed to find the most similar instances to x in the validation set, which is defined by the M latest supervised batches that arrived to be trained on. Once the similar instances are gathered, any selection method that depends on it can be applied.

clf : Scikit-Multiflow Classifier: The base classifier used for populating the ensemble
chunk_size : integer: The size of the chunks to accumulate data before fitting a classifier.
dcs_method : DCSTechnique object: Dynamic selection technique to be used in the prediction process.
max_ensemble_size : integer, default=-1: The maximum size that an ensemble can grow. If -1, it grows indefinitely.

Almeida, P. R. L. D.; Oliveira, L. S.; Britto, A. D. S.; Sabourin, R. 2016. Handling concept drifts using dynamic selection of classifiers. In:2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI). [S.l.: s.n.]. p. 989–995. ISSN 2375-0197.

partial_fit(X, y, classes=None, sample_weight=None)¶

Partially (incrementally) fit the model. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Array with all possible/known class labels. Usage varies depending on the learning method.
sample_weight: numpy.ndarray of shape (n_samples), optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

self Notes ——- Description taken from scikit-multiflow

predict(X, y=None)¶

Predict classes for the passed data. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The set of data samples to predict the class labels for.

y : array-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X. This parameter is only considered with the oracle selector.

A numpy.ndarray with all the predictions for the samples in X. Notes ——- - Description partially taken from scikit-multiflow - This method signature is overwritten because we need to pass the y parameter for the predict method when using

the oracle selector

predict_proba(X)¶

Estimates the probability of each sample in X belonging to each of the class-labels.

X : numpy.ndarray of shape (n_samples, n_features): The matrix of samples one wants to predict the class probabilities for.

A numpy.ndarray of shape (n_samples, n_labels), in which each outer entry is associated with the X entry of the same index. And where the list in index [i] contains len(self.target_values) elements, each of which represents the probability that the i-th sample of X belongs to a certain class-label.

class dyn2sel.apply_dcs.DESDDMethod(base_ensemble, drift_detector=ADWIN(delta=0.002), ensemble_size=10, min_lambda=1, max_lambda=6)¶

Bases: dyn2sel.apply_dcs.base.DCSApplier

Dynamic Ensemble Selection for Drift Detection (DESDD) method provides a different concept in Dynamic Selection. Methods such as KNORA-E and KNORA-U are also referred to as DES methods because they select a subset of the ensemble, which is, by definition, also an ensemble. DESDD, in contrast, creates a group of ensembles and selects the one with the higher accuracy to be the predictor of the instance.

base_ensemble : Scikit-Multiflow ensemble: The ensemble used for populating the ensemble of ensembles
drift_detector : Scikit-Multiflow Drift Detector, default=ADWIN(): The drift detector used for detecting drift on the ensemble. When a drift is detected the whole ensemble is discarded and its construction starts over again.
ensemble_size : integer, default=10: The number of ensembles used in the ensemble of ensembles.
min_lambda : integer, default=1: The minimum lambda value used for online bagging in the ensemble generation process.
max_lambda : integer, default=6: The maximum lambda value used for online bagging in the ensemble generation process.

Albuquerque, R. A. S., Costa, A. F. J., Santos, E. M. dos, Sabourin, R.,Giusti, R.A. 2019. Decision-Based Dynamic Ensemble Selection Method for Concept Drift.

partial_fit(X, y, classes=None, sample_weight=None)¶

Partially (incrementally) fit the model. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Array with all possible/known class labels. Usage varies depending on the learning method.
sample_weight: numpy.ndarray of shape (n_samples), optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

self Notes ——- Description taken from scikit-multiflow

predict(X, y=None)¶

Predict classes for the passed data. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The set of data samples to predict the class labels for.

y : array-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X. This parameter is only considered with the oracle selector.

A numpy.ndarray with all the predictions for the samples in X. Notes ——- - Description partially taken from scikit-multiflow - This method signature is overwritten because we need to pass the y parameter for the predict method when using

the oracle selector

predict_proba(X)¶

Estimates the probability of each sample in X belonging to each of the class-labels.

X : numpy.ndarray of shape (n_samples, n_features): The matrix of samples one wants to predict the class probabilities for.

A numpy.ndarray of shape (n_samples, n_labels), in which each outer entry is associated with the X entry of the same index. And where the list in index [i] contains len(self.target_values) elements, each of which represents the probability that the i-th sample of X belongs to a certain class-label.

update_detector(X, y)¶

class dyn2sel.apply_dcs.MDEMethod(clf, chunk_size, max_ensemble_size=-1, alpha=0.3)¶

Bases: dyn2sel.apply_dcs.base.DCSApplier

The Minority Driven Ensemble (MDE) is not only a selection method but a whole framework that covers training an ensemble with a data stream and predicting new instances. It focuses on data that are affected by the imbalanced class problem. This is done by replacing poor performing classifiers with new ones. The method divides the stream into chunks of data with a fixed size. Each chunk is passed as a mini-batch for the ensemble to train on. For each data chunk, the instances belonging to the minority class are filtered in order to remove outliers. That is done using K-Nearest Neighbors on the current data chunk. If the nearest neighbors of each instance belong to the majority class, the instance is then considered an outlier and then removed from the chunk. When predicting, if a single member predicted the instance as being from the minority class, it is then considered to be from such class.

clf : Scikit-Multiflow Classifier: The base classifier used for populating the ensemble
chunk_size : integer: The size of the chunks to accumulate data before fitting a classifier.
max_ensemble_size : integer, default=-1: The maximum size that an ensemble can grow. If -1, it grows indefinitely.
alpha : float between 0 and 1, default=0.3: Value that composes the threshold for removing classifiers with low Balanced Class Accuracy (BAC), If one classifier is with value less than 0.5 + alpha, it is removed from the ensemble.

Zyblewski, P.; Ksieniewicz, P.; Woźniak, M. Classifier selection for highly imbalanced data streams with minority driven ensemble. In: Artificial Intelligence and Soft Computing. Cham: Springer International Publishing, 2019. p. 626–635. ISBN 978-3-030-20912-4

get_minority_class(y)¶

partial_fit(X, y, classes=None, sample_weight=None)¶

Partially (incrementally) fit the model. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Array with all possible/known class labels. Usage varies depending on the learning method.
sample_weight: numpy.ndarray of shape (n_samples), optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

self Notes ——- Description taken from scikit-multiflow

predict(X, y=None)¶

Predict classes for the passed data. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The set of data samples to predict the class labels for.

y : array-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X. This parameter is only considered with the oracle selector.

A numpy.ndarray with all the predictions for the samples in X. Notes ——- - Description partially taken from scikit-multiflow - This method signature is overwritten because we need to pass the y parameter for the predict method when using

the oracle selector

predict_proba(X)¶

Estimates the probability of each sample in X belonging to each of the class-labels.

X : numpy.ndarray of shape (n_samples, n_features): The matrix of samples one wants to predict the class probabilities for.

A numpy.ndarray of shape (n_samples, n_labels), in which each outer entry is associated with the X entry of the same index. And where the list in index [i] contains len(self.target_values) elements, each of which represents the probability that the i-th sample of X belongs to a certain class-label.

class dyn2sel.apply_dcs.PDCESMethod(clf, chunk_size, max_ensemble_size=-1, bagging_size=5, dcs_method=<dyn2sel.dcs_techniques.from_deslib.knora_e.KNORAE object>, preprocess=SMOTE())¶

Bases: dyn2sel.apply_dcs.base.DCSApplier

The Preprocess Dynamic Classsifier Ensemble Selection (PDCES) is not only a selection method but a whole framework that covers training an ensemble with a data stream and predicting new instances. It focuses on data that are affected by the imbalanced class problem. This is done by replacing poor performing classifiers with new ones. The method divides the stream into chunks of data with a fixed size. Each chunk is passed as a mini-batch for the ensemble to train on. Each data chunk is firstly sent to be predicted, then to train a new classifier and add it to the ensemble. The prediction step is performed using traditional DCS methods, with a validation set that is defined as the last trained chunk. As this method is focused on imbalanced problems, before updating the validation set, a preprocessing (over/undersampling) step is performed on the chunk. The base classifiers of this ensemble are all stratified bagging. The base classifier of the baggings can be anyone.

clf : Scikit-Multiflow Classifier: The base classifier for the bagging ensembles used for populating the ensemble.
chunk_size : integer: The size of the chunks to accumulate data before fitting a classifier.
max_ensemble_size : integer, default=-1: The maximum size that an ensemble can grow. If -1, it grows indefinitely.
bagging_size : integer, default=5: The size of the bagging classifiers.
dcs_method : DCSTechnique object: Dynamic selection technique to be used in the prediction process.
preprocess : SamplerMixin object from imblearn: Preprocess method to use when updating the validation set

Zyblewski, P., Sabourin, R., & Woźniak, M. (2021). Preprocessed dynamic classifier ensemble selection for highly imbalanced drifted data streams. Information Fusion, 66, 138–154. https://doi.org/10.1016/j.inffus.2020.09.004

get_minority_class(y)¶

partial_fit(X, y, classes=None, sample_weight=None)¶

Partially (incrementally) fit the model. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Array with all possible/known class labels. Usage varies depending on the learning method.
sample_weight: numpy.ndarray of shape (n_samples), optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

self Notes ——- Description taken from scikit-multiflow

predict(X, y=None)¶

Predict classes for the passed data. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The set of data samples to predict the class labels for.

y : array-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X. This parameter is only considered with the oracle selector.

A numpy.ndarray with all the predictions for the samples in X. Notes ——- - Description partially taken from scikit-multiflow - This method signature is overwritten because we need to pass the y parameter for the predict method when using

the oracle selector

predict_proba(X)¶

Estimates the probability of each sample in X belonging to each of the class-labels.

X : numpy.ndarray of shape (n_samples, n_features): The matrix of samples one wants to predict the class probabilities for.

A numpy.ndarray of shape (n_samples, n_labels), in which each outer entry is associated with the X entry of the same index. And where the list in index [i] contains len(self.target_values) elements, each of which represents the probability that the i-th sample of X belongs to a certain class-label.

class dyn2sel.apply_dcs.DPDESMethod(clf, chunk_size, max_ensemble_size=-1, dcs_method=<dyn2sel.dcs_techniques.from_deslib.knora_e.KNORAE object>, preprocess=SMOTE())¶

Bases: dyn2sel.apply_dcs.base.DCSApplier

PDCESMethod The Preprocess Dynamic Classsifier Ensemble Selection (PDCES) is not only a selection method but a whole framework that covers training an ensemble with a data stream and predicting new instances. It focuses on data that are affected by the imbalanced class problem. This is done by replacing poor performing classifiers with new ones. The method divides the stream into chunks of data with a fixed size. Each chunk is passed as a mini-batch for the ensemble to train on. Each data chunk is firstly sent to be predicted, then to train a new classifier and add it to the ensemble. The prediction step is performed using traditional DCS methods, with a validation set that is defined as the last trained chunk. As this method is focused on imbalanced problems, before training and updating the validation set, a preprocessing (over/undersampling) step is performed on the chunk.

clf : Scikit-Multiflow Classifier: The base classifier used for populating the ensemble
chunk_size : integer: The size of the chunks to accumulate data before fitting a classifier.
max_ensemble_size : integer, default=-1: The maximum size that an ensemble can grow. If -1, it grows indefinitely.
dcs_method : DCSTechnique object: Dynamic selection technique to be used in the prediction process.
preprocess : SamplerMixin object from imblearn: Preprocess method to use when updating the validation set

Zyblewski, P., Sabourin, R., & Woźniak, M. (2021). Preprocessed dynamic classifier ensemble selection for highly imbalanced drifted data streams. Information Fusion, 66, 138–154. https://doi.org/10.1016/j.inffus.2020.09.004

get_minority_class(y)¶

partial_fit(X, y, classes=None, sample_weight=None)¶

Partially (incrementally) fit the model. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Array with all possible/known class labels. Usage varies depending on the learning method.
sample_weight: numpy.ndarray of shape (n_samples), optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

self Notes ——- Description taken from scikit-multiflow

predict(X, y=None)¶

Predict classes for the passed data. Parameters ———- X : numpy.ndarray of shape (n_samples, n_features)

The set of data samples to predict the class labels for.

y : array-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X. This parameter is only considered with the oracle selector.

A numpy.ndarray with all the predictions for the samples in X. Notes ——- - Description partially taken from scikit-multiflow - This method signature is overwritten because we need to pass the y parameter for the predict method when using

the oracle selector

predict_proba(X)¶

Estimates the probability of each sample in X belonging to each of the class-labels.

X : numpy.ndarray of shape (n_samples, n_features): The matrix of samples one wants to predict the class probabilities for.

A numpy.ndarray of shape (n_samples, n_labels), in which each outer entry is associated with the X entry of the same index. And where the list in index [i] contains len(self.target_values) elements, each of which represents the probability that the i-th sample of X belongs to a certain class-label.