cacp.comparison¶
- cacp.comparison.process_comparison(datasets: typing.List[cacp.dataset.ClassificationDatasetBase], classifiers: typing.List[typing.Tuple[str, typing.Callable]], result_dir: pathlib.Path, metrics: typing.Sequence[typing.Tuple[str, typing.Callable]] = (('AUC', <function auc>), ('Accuracy', <function accuracy>), ('Precision', <function precision>), ('Recall', <function recall>), ('F1', <function f1>)), n_folds: typing.Literal[5, 10] = 10, custom_fold_modifiers: typing.Optional[typing.List[cacp.dataset.ClassificationFoldDataModifierBase]] = None, dob_scv: bool = True, categorical_to_numerical=True, normalized: bool = False, progress=<function <lambda>>)[source]¶
Runs comparison for provided datasets and classifiers.
- Parameters
datasets – dataset collection
classifiers – classifiers collection
result_dir – results directory
metrics – metrics collection
n_folds – number of folds {5,10}
custom_fold_modifiers – custom fold modifiers that can change fold data before usage
dob_scv – if folds distribution optimally balanced stratified cross-validation (DOB-SCV) should be used
categorical_to_numerical – if dataset categorical values should be converted to numerical
normalized – if the data should be normalized in range [0..1]
progress – function that can be used to monitor progress
- cacp.comparison.process_comparison_single(classifier_factory, classifier_name, dataset: cacp.dataset.ClassificationDatasetBase, fold: cacp.dataset.ClassificationFoldData, metrics: Sequence[Tuple[str, Callable]]) dict[source]¶
Runs comparison on single classifier and dataset.
- Parameters
classifier_factory – classifier factory
classifier_name – classifier name
dataset – single dataset
fold – fold data
metrics – metrics collection
- Returns
dictionary of calculated metrics and metadata
- cacp.comparison.process_incremental_comparison(datasets: typing.List[typing.Union[cacp.dataset.ClassificationDatasetBase, river.datasets.base.Dataset]], classifiers: typing.List[typing.Tuple[str, typing.Callable]], result_dir: pathlib.Path, metrics: typing.Sequence[typing.Tuple[str, typing.Callable]] = (('AUC', <class 'river.metrics.roc_auc.ROCAUC'>), ('Accuracy', <class 'river.metrics.accuracy.Accuracy'>), ('Precision', <class 'river.metrics.precision.Precision'>), ('Recall', <class 'river.metrics.recall.Recall'>), ('F1', <class 'river.metrics.fbeta.F1'>)), progress=<function <lambda>>)[source]¶
Runs comparison for provided datasets and incremental classifiers.
- Parameters
datasets – dataset collection
classifiers – classifiers collection
result_dir – results directory
metrics – metrics collection
progress – function that can be used to monitor progress
- cacp.comparison.process_incremental_comparison_single(classifier_factory, classifier_name, dataset: typing.Union[cacp.dataset.ClassificationDatasetBase, river.datasets.base.Dataset], number_of_classes: int, incremental_comparison_dir: pathlib.Path, metrics: typing.Sequence[typing.Tuple[str, typing.Callable]] = (('AUC', <class 'river.metrics.roc_auc.ROCAUC'>), ('Accuracy', <class 'river.metrics.accuracy.Accuracy'>), ('Precision', <class 'river.metrics.precision.Precision'>), ('Recall', <class 'river.metrics.recall.Recall'>), ('F1', <class 'river.metrics.fbeta.F1'>))) dict[source]¶
Runs comparison on single classifier and dataset.
- Parameters
classifier_factory – classifier factory
classifier_name – classifier name
dataset – single dataset
number_of_classes – number of classes
incremental_comparison_dir – incremental single results directory
metrics – metrics collection
- Returns
dictionary of calculated metrics and metadata
cacp.dataset¶
- class cacp.dataset.ClassificationDataset(name: Literal['abalone', 'appendicitis', 'australian', 'automobile', 'balance', 'banana', 'bands', 'breast', 'bupa', 'car', 'chess', 'cleveland', 'coil2000', 'contraceptive', 'crx', 'dermatology', 'ecoli', 'flare', 'german', 'glass', 'haberman', 'hayes-roth', 'heart', 'hepatitis', 'housevotes', 'ionosphere', 'iris', 'kr-vs-k', 'led7digit', 'letter', 'lymphography', 'magic', 'mammographic', 'marketing', 'monk-2', 'movement_libras', 'mushroom', 'newthyroid', 'nursery', 'optdigits', 'page-blocks', 'penbased', 'phoneme', 'pima', 'post-operative', 'ring', 'saheart', 'satimage', 'segment', 'shuttle', 'sonar', 'spambase', 'spectfheart', 'splice', 'tae', 'texture', 'thyroid', 'tic-tac-toe', 'titanic', 'twonorm', 'vehicle', 'vowel', 'wdbc', 'wine', 'winequality-red', 'winequality-white', 'wisconsin', 'yeast', 'zoo'], files_cache_path=PosixPath('/home/docs/cacp_files'), seed=1)[source]¶
Bases:
cacp.dataset.ClassificationDatasetBaseClass that represents KEEL single dataset.
- property classes: int¶
- property features: int¶
- folds(n_folds: Literal[5, 10] = 10, dob_scv: bool = True, categorical_to_numerical=True) Iterator[cacp.dataset.ClassificationFoldData][source]¶
- property instances: int¶
- property name: str¶
- property origin: str¶
- property output_name: str¶
- class cacp.dataset.ClassificationDatasetBase(seed=1)[source]¶
Bases:
cacp.dataset.ClassificationDatasetMinimalBaseBase class for classification dataset that represents single dataset.
- abstract property classes: int¶
- abstract property features: int¶
- abstract property instances: int¶
- abstract property name: str¶
- class cacp.dataset.ClassificationDatasetMinimalBase(seed=1)[source]¶
Bases:
abc.ABCMinimal base class for classification dataset that represents single dataset.
- abstract folds(n_folds: Literal[5, 10] = 10, dob_scv: bool = True, categorical_to_numerical=True) Iterable[cacp.dataset.ClassificationFoldData][source]¶
- class cacp.dataset.ClassificationFoldData(index: int, labels: numpy.ndarray, x_train: numpy.ndarray, y_train: numpy.ndarray, x_test: numpy.ndarray, y_test: numpy.ndarray)[source]¶
Bases:
objectClass that represents single dataset fold.
- index: int¶
- labels: numpy.ndarray¶
- x_test: numpy.ndarray¶
- x_train: numpy.ndarray¶
- y_test: numpy.ndarray¶
- y_train: numpy.ndarray¶
- class cacp.dataset.ClassificationFoldDataModifierBase[source]¶
Bases:
abc.ABC- abstract modify(fold: cacp.dataset.ClassificationFoldData) cacp.dataset.ClassificationFoldData[source]¶
- class cacp.dataset.LocalClassificationDataset(name: str, dataset_directory: pathlib.Path)[source]¶
Bases:
cacp.dataset.ClassificationDatasetClass that represents single local dataset that has similar structure to KEEL dataset.
- class cacp.dataset.LocalCsvClassificationDataset(name: str, dataset_path: pathlib.Path)[source]¶
Bases:
cacp.dataset.ClassificationDatasetBaseClass that represents single local dataset that is SCV with header.
- property classes: int¶
- property features: int¶
- folds(n_folds: Literal[5, 10] = 10, dob_scv: bool = True, categorical_to_numerical=True) Iterable[cacp.dataset.ClassificationFoldData][source]¶
- property instances: int¶
- property name: str¶
- cacp.dataset.all_datasets() List[cacp.dataset.ClassificationDataset][source]¶
Gets all available datasets
- Returns
all classification datasets
cacp.info¶
- cacp.info.classifier_info(classifiers: Iterable[Tuple[str, Callable]], result_dir: pathlib.Path)[source]¶
Produces results files with list of all classifiers used in experiment along with their attributes.
- Parameters
classifiers – classifiers collection
result_dir – results directory
- cacp.info.dataset_info(datasets: Iterable[Union[cacp.dataset.ClassificationDatasetBase, river.datasets.base.Dataset]], result_dir: pathlib.Path)[source]¶
Produces results files with list of all datasets used in experiment alog with their attributes.
- Parameters
datasets – dataset collection
result_dir – results directory
cacp.plot¶
- class cacp.plot.Line(x: numpy.ndarray, y: numpy.ndarray, label: str = '')[source]¶
Bases:
object- label: str = ''¶
- x: numpy.ndarray¶
- y: numpy.ndarray¶
- cacp.plot.process_comparison_results_incremental_plot(file_name: str, y_label: str, lines: List[cacp.plot.Line], plot_dir: pathlib.Path)[source]¶
- cacp.plot.process_comparison_results_incremental_plots(result_dir: pathlib.Path, metrics: typing.Sequence[typing.Tuple[str, typing.Callable]] = (('AUC', <class 'river.metrics.roc_auc.ROCAUC'>), ('Accuracy', <class 'river.metrics.accuracy.Accuracy'>), ('Precision', <class 'river.metrics.precision.Precision'>), ('Recall', <class 'river.metrics.recall.Recall'>), ('F1', <class 'river.metrics.fbeta.F1'>)))[source]¶
Generates plots from incremental comparison results.
- Parameters
result_dir – results directory
metrics – metrics collection
- cacp.plot.process_comparison_results_plots(result_dir: pathlib.Path, metrics: typing.Sequence[typing.Tuple[str, typing.Callable]] = (('AUC', <function auc>), ('Accuracy', <function accuracy>), ('Precision', <function precision>), ('Recall', <function recall>), ('F1', <function f1>)))[source]¶
Generates plots from comparison results.
- Parameters
result_dir – results directory
metrics – metrics collection
- cacp.plot.process_comparison_results_single_incremental_plot(classifier_name: str, dataset_name: str, metric: str, df: pandas.core.frame.DataFrame, incremental_plot_dir: pathlib.Path)[source]¶
Generates plots from single incremental comparison results.
- Parameters
classifier_name – classifier name
dataset_name – dataset name
metric – metric name
df – result dataframe
incremental_plot_dir – output plot directory
cacp.result¶
- cacp.result.process_comparison_results(result_dir: pathlib.Path, metrics: typing.Sequence[typing.Tuple[str, typing.Callable]] = (('AUC', <function auc>), ('Accuracy', <function accuracy>), ('Precision', <function precision>), ('Recall', <function recall>), ('F1', <function f1>)))[source]¶
Processes comparison results, computes mean values for all metrics.
- Parameters
result_dir – results directory
metrics – metrics collection
cacp.run¶
- cacp.run.run_experiment(datasets: typing.List[cacp.dataset.ClassificationDatasetBase], classifiers: typing.List[typing.Tuple[str, typing.Callable]], results_directory: typing.Union[str, os.PathLike] = './result', metrics: typing.Sequence[typing.Tuple[str, typing.Callable]] = (('AUC', <function auc>), ('Accuracy', <function accuracy>), ('Precision', <function precision>), ('Recall', <function recall>), ('F1', <function f1>)), n_folds: typing.Literal[5, 10] = 10, custom_fold_modifiers: typing.Optional[typing.List[cacp.dataset.ClassificationFoldDataModifierBase]] = None, dob_scv: bool = True, categorical_to_numerical=True, normalized: bool = False, seed: int = 1, progress=<function <lambda>>)[source]¶
[Main CACP Function] Runs automatic comparison of the performance evaluation of supervised classification algorithms by evaluating metrics on multiple datasets.
- Parameters
datasets – dataset collection
classifiers – classifiers collection
results_directory – results directory
metrics – metrics collection
n_folds – number of folds {5,10}
custom_fold_modifiers – custom fold modifiers that can change fold data before usage
dob_scv – if folds distribution optimally balanced stratified cross-validation (DOB-SCV) should be used
categorical_to_numerical – if dataset categorical values should be converted to numerical
normalized – if the data should be normalized in range [0..1]
seed – random seed value
progress – function that can be used to monitor progress
- cacp.run.run_incremental_experiment(datasets: typing.List[typing.Union[cacp.dataset.ClassificationDatasetBase, river.datasets.base.Dataset]], classifiers: typing.List[typing.Tuple[str, typing.Callable]], results_directory: typing.Union[str, os.PathLike] = './result', metrics: typing.Sequence[typing.Tuple[str, typing.Callable]] = (('AUC', <class 'river.metrics.roc_auc.ROCAUC'>), ('Accuracy', <class 'river.metrics.accuracy.Accuracy'>), ('Precision', <class 'river.metrics.precision.Precision'>), ('Recall', <class 'river.metrics.recall.Recall'>), ('F1', <class 'river.metrics.fbeta.F1'>)), seed: int = 1, progress=<function <lambda>>)[source]¶
[Main CACP Function] Runs automatic comparison of the performance evaluation of supervised classification algorithms by evaluating metrics on multiple datasets.
- Parameters
datasets – dataset collection
classifiers – classifiers collection
results_directory – results directory
metrics – metrics collection
seed – random seed value
progress – function that can be used to monitor progress
cacp.time¶
cacp.util¶
- cacp.util.auc_score(y_true: numpy.ndarray, y_pred: numpy.ndarray, average=None, multi_class=None, labels: Optional[numpy.ndarray] = None) float[source]¶
Calculates multiclass AUC score.
- Parameters
y_true – real labels
y_pred – predicted labels
average – sklearn roc_auc_score param
multi_class – sklearn roc_auc_score param
labels – sklearn roc_auc_score param
- Returns
AUC value
cacp.wilcoxon¶
- cacp.wilcoxon.bold_large_p_value(data: float, format_string='%.4f') str[source]¶
Makes large p-value in Latex table bold
- Parameters
data – value
format_string –
- Returns
bolded values string
- cacp.wilcoxon.process_wilcoxon(classifiers: typing.List[typing.Tuple[str, typing.Callable]], result_dir: pathlib.Path, metrics: typing.Sequence[typing.Tuple[str, typing.Callable]] = (('AUC', <function auc>), ('Accuracy', <function accuracy>), ('Precision', <function precision>), ('Recall', <function recall>), ('F1', <function f1>)))[source]¶
Calculates the Wilcoxon signed-rank test for comparison results.
- Parameters
classifiers – classifiers collection
result_dir – results directory
metrics – metrics collection
- cacp.wilcoxon.process_wilcoxon_for_metric(current_algorithm: str, metric: str, result_dir: pathlib.Path) pandas.core.frame.DataFrame[source]¶
Calculates the Wilcoxon signed-rank test for comparison results single metric.
- Parameters
current_algorithm – current algorithm
metric – comparison metric {auc, accuracy, precision, recall, f1}
result_dir – results directory
- Returns
DateFrame with wilcoxon values for metric
cacp.winner¶
- cacp.winner.process_comparison_result_winners(result_dir: pathlib.Path, metrics: typing.Sequence[typing.Tuple[str, typing.Callable]] = (('AUC', <function auc>), ('Accuracy', <function accuracy>), ('Precision', <function precision>), ('Recall', <function recall>), ('F1', <function f1>)))[source]¶
Processes comparison results, finds winners.
- Parameters
result_dir – results directory
metrics – metrics collection
- cacp.winner.process_comparison_result_winners_for_metric(metric: str, result_dir: pathlib.Path) pandas.core.frame.DataFrame[source]¶
Processes comparison results, finds winners for metric.
- Parameters
metric – comparison metric {auc, accuracy, precision, recall, f1}
result_dir – results directory
- Returns
DateFrame with winners for metric