MultiLabelMetric¶
- class mmpretrain.evaluation.MultiLabelMetric(thr=None, topk=None, items=('precision', 'recall', 'f1-score'), average='macro', collect_device='cpu', prefix=None)[source]¶
- A collection of precision, recall, f1-score and support for multi-label tasks. - The collection of metrics is for single-label multi-class classification. And all these metrics are based on the confusion matrix of every category:   - All metrics can be formulated use variables above: - Precision is the fraction of correct predictions in all predictions: \[\text{Precision} = \frac{TP}{TP+FP}\]- Recall is the fraction of correct predictions in all targets: \[\text{Recall} = \frac{TP}{TP+FN}\]- F1-score is the harmonic mean of the precision and recall: \[\text{F1-score} = \frac{2\times\text{Recall}\times\text{Precision}}{\text{Recall}+\text{Precision}}\]- Support is the number of samples: \[\text{Support} = TP + TN + FN + FP\]- Parameters:
- thr (float, optional) – Predictions with scores under the threshold are considered as negative. If None, the - topkpredictions will be considered as positive. If the- topkis also None, use- thr=0.5as default. Defaults to None.
- topk (int, optional) – Predictions with the k-th highest scores are considered as positive. If None, use - thrto determine positive predictions. If both- thrand- topkare not None, use- thr. Defaults to None.
- items (Sequence[str]) – The detailed metric items to evaluate, select from “precision”, “recall”, “f1-score” and “support”. Defaults to - ('precision', 'recall', 'f1-score').
- average (str | None) – - How to calculate the final metrics from the confusion matrix of every category. It supports three modes: - ”macro”: Calculate metrics for each category, and calculate the mean value over all categories. 
- ”micro”: Average the confusion matrix over all categories and calculate metrics on the mean confusion matrix. 
- None: Calculate metrics of every category and output directly. 
 - Defaults to “macro”. 
- collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’. 
- prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None. 
 
 - Examples - >>> import torch >>> from mmpretrain.evaluation import MultiLabelMetric >>> # ------ The Basic Usage for category indices labels ------- >>> y_pred = [[0], [1], [0, 1], [3]] >>> y_true = [[0, 3], [0, 2], [1], [3]] >>> # Output precision, recall, f1-score and support >>> MultiLabelMetric.calculate( ... y_pred, y_true, pred_indices=True, target_indices=True, num_classes=4) (tensor(50.), tensor(50.), tensor(45.8333), tensor(6)) >>> # ----------- The Basic Usage for one-hot labels ----------- >>> y_pred = torch.tensor([[1, 1, 0, 0], ... [1, 1, 0, 0], ... [0, 0, 1, 0], ... [0, 1, 0, 0], ... [0, 1, 0, 0]]) >>> y_true = torch.Tensor([[1, 1, 0, 0], ... [0, 0, 1, 0], ... [1, 1, 1, 0], ... [1, 0, 0, 0], ... [1, 0, 0, 0]]) >>> MultiLabelMetric.calculate(y_pred, y_true) (tensor(43.7500), tensor(31.2500), tensor(33.3333), tensor(8)) >>> # --------- The Basic Usage for one-hot pred scores --------- >>> y_pred = torch.rand(y_true.size()) >>> y_pred tensor([[0.4575, 0.7335, 0.3934, 0.2572], [0.1318, 0.1004, 0.8248, 0.6448], [0.8349, 0.6294, 0.7896, 0.2061], [0.4037, 0.7308, 0.6713, 0.8374], [0.3779, 0.4836, 0.0313, 0.0067]]) >>> # Calculate with different threshold. >>> MultiLabelMetric.calculate(y_pred, y_true, thr=0.1) (tensor(42.5000), tensor(75.), tensor(53.1746), tensor(8)) >>> # Calculate with topk. >>> MultiLabelMetric.calculate(y_pred, y_true, topk=1) (tensor(62.5000), tensor(31.2500), tensor(39.1667), tensor(8)) >>> >>> # ------------------- Use with Evaluator ------------------- >>> from mmpretrain.structures import DataSample >>> from mmengine.evaluator import Evaluator >>> data_sampels = [ ... DataSample().set_pred_score(pred).set_gt_score(gt) ... for pred, gt in zip(torch.rand(1000, 5), torch.randint(0, 2, (1000, 5)))] >>> evaluator = Evaluator(metrics=MultiLabelMetric(thr=0.5)) >>> evaluator.process(data_sampels) >>> evaluator.evaluate(1000) { 'multi-label/precision': 50.72898037055408, 'multi-label/recall': 50.06836461357571, 'multi-label/f1-score': 50.384466955258475 } >>> # Evaluate on each class by using topk strategy >>> evaluator = Evaluator(metrics=MultiLabelMetric(topk=1, average=None)) >>> evaluator.process(data_sampels) >>> evaluator.evaluate(1000) { 'multi-label/precision_top1_classwise': [48.22, 50.54, 50.99, 44.18, 52.5], 'multi-label/recall_top1_classwise': [18.92, 19.22, 19.92, 20.0, 20.27], 'multi-label/f1-score_top1_classwise': [27.18, 27.85, 28.65, 27.54, 29.25] } - static calculate(pred, target, pred_indices=False, target_indices=False, average='macro', thr=None, topk=None, num_classes=None)[source]¶
- Calculate the precision, recall, f1-score. - Parameters:
- pred (torch.Tensor | np.ndarray | Sequence) – The prediction results. A - torch.Tensoror- np.ndarraywith shape- (N, num_classes)or a sequence of index/onehot format labels.
- target (torch.Tensor | np.ndarray | Sequence) – The prediction results. A - torch.Tensoror- np.ndarraywith shape- (N, num_classes)or a sequence of index/onehot format labels.
- pred_indices (bool) – Whether the - predis a sequence of category index labels. If True,- num_classesmust be set. Defaults to False.
- target_indices (bool) – Whether the - targetis a sequence of category index labels. If True,- num_classesmust be set. Defaults to False.
- average (str | None) – - How to calculate the final metrics from the confusion matrix of every category. It supports three modes: - ”macro”: Calculate metrics for each category, and calculate the mean value over all categories. 
- ”micro”: Average the confusion matrix over all categories and calculate metrics on the mean confusion matrix. 
- None: Calculate metrics of every category and output directly. 
 - Defaults to “macro”. 
- thr (float, optional) – Predictions with scores under the thresholds are considered as negative. Defaults to None. 
- topk (int, optional) – Predictions with the k-th highest scores are considered as positive. Defaults to None. 
- num_classes (Optional, int) – The number of classes. If the - predis indices instead of onehot, this argument is required. Defaults to None.
 
- Returns:
- The tuple contains precision, recall and f1-score. And the type of each item is: - torch.Tensor: A tensor for each metric. The shape is (1, ) if - averageis not None, and (C, ) if- averageis None.
 
- Return type:
- Tuple 
 - Notes - If both - thrand- topkare set, use- thr` to determine positive predictions. If neither is set, use ``thr=0.5as default.
 - compute_metrics(results)[source]¶
- Compute the metrics from processed results. - Parameters:
- results (list) – The processed results of each batch. 
- Returns:
- The computed metrics. The keys are the names of the metrics, and the values are corresponding results. 
- Return type:
- Dict 
 
 - process(data_batch, data_samples)[source]¶
- Process one batch of data samples. - The processed results should be stored in - self.results, which will be used to computed the metrics when all batches have been processed.- Parameters:
- data_batch – A batch of data from the dataloader. 
- data_samples (Sequence[dict]) – A batch of outputs from the model.