meerkat.ml package#
Submodules#
meerkat.ml.activation module#
meerkat.ml.callbacks module#
meerkat.ml.embedding_column module#
- class EmbeddingColumn(data: Sequence | None = None, *args, **kwargs)[source]#
Bases:
TensorColumn
meerkat.ml.huggingfacemodel module#
meerkat.ml.instances_column module#
meerkat.ml.metrics module#
- accuracy(predictions: list | array | Tensor, labels: list | array | Tensor)[source]#
Calculate accuracy.
- class_distribution(labels: list | array | Tensor, num_classes: int | None = None, min_label: int = 0)[source]#
Calculate the aggregated class distribution.
- compute_metric(metric: str, predictions: Sequence | Tensor, labels: Sequence | Tensor, num_classes: int) float | ndarray | Tensor [source]#
Compute metric given predictions and target labels.
- Parameters:
metric (str) – name of metric
predictions (Union[Sequence, torch.Tensor]) – a sequence of predictions (rouge metrics) or a torch Tensor (other metrics) containing predictions
labels (Union[Sequence, torch.Tensor]) – a sequence of labels (rouge metrics) or a torch Tensor (other metrics) containing target labels
num_classes (int) – number of classes
- Returns:
the calculate metric value
- dice(predictions: list | array | Tensor, labels: list | array | Tensor)[source]#
Calculate Dice Score.
- f1(predictions: list | array | Tensor, labels: list | array | Tensor)[source]#
Calculate F1 score for binary classification.
- f1_macro(predictions: list | array | Tensor, labels: list | array | Tensor)[source]#
Calculate macro F1 score for multi-class classification.
meerkat.ml.model module#
- class Model(model: Module, is_classifier: bool | None = None, task: str | None = None, device: str | None = None)[source]#
Bases:
Module
- activation(dataset: DataPanel, target_module: str, input_columns: List[str], batch_size=32) EmbeddingColumn [source]#
An Operation that stores model activations in a new Embedding column.
- Parameters:
dataset (DataPanel) – the meerkat DataPanel containing the model inputs.
target_module (str) – the name of the submodule of model (i.e. an intermediate layer) that outputs the activations we’d like to extract. For nested submodules, specify a path separated by “.” (e.g. ActivationCachedOp(model, “block4.conv”)).
input_columns (str) – Column containing model inputs
- classification(dataset: DataPanel, input_columns: List[str], batch_size: int = 32, num_classes: int | None = None, multi_label: bool = False, one_hot: bool | None = None, threshold=0.5) DataPanel [source]#
- evaluate(dataset: DataPanel, target_column: List[str], pred_column: List[str], metrics: List[str], num_classes: int | None = None)[source]#
- forward(input_batch: Dict) Dict [source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- static remap_labels(output_dict: Dict, label_map: List[int]) Dict [source]#
Map the output labels of the model.
Example: 3-way classificaiton, with label_map = [1, 2, 0] => (model label 0 -> dataset label 1, model label 1 -> dataset label 2, …).
- training: bool#
meerkat.ml.prediction_column module#
- class ClassificationOutputColumn(logits: Sequence | ndarray | Series | Tensor | None = None, probs: Sequence | ndarray | Series | Tensor | None = None, preds: Sequence | ndarray | Series | Tensor | None = None, num_classes: int | None = None, multi_label: bool = False, one_hot: bool | None = None, threshold=0.5, *args, **kwargs)[source]#
Bases:
TensorColumn
- bincount() TensorColumn [source]#
Compute the count (cardinality) for each category.
Categories which are not available will have a count of 0.
If
self.multi_label=True
, the bincount will include the total number of times the category is seen. If an example is marked as 2 categories, the bincount will increase the count for both categories. Note, this means the sum of the number of classes can be more than the number of examplesN
.- Returns:
A 1D tensor of length
self.num_classes
.- Return type:
torch.Tensor
- entropy() TensorColumn [source]#
Compute the entropy for each example.
If
self.multi_label
is True, each category is treated as a binary classification problem. There will be an entropy calculation for each category as well. For example, if the probabilities are of shape(N, C)
, there will beNxC
entropy values.In the multi-dimensional case, this returns the entropy for each element. For example, if the probabilities are of shape
(N, C, A, B)
, there will beNxAxB
entropy values.- Returns:
Tensor of entropies
- Return type:
- logits() ClassificationOutputColumn [source]#
- predictions() ClassificationOutputColumn [source]#
Compute predictions.
- preds() ClassificationOutputColumn #
Compute predictions.
- probabilities() ClassificationOutputColumn [source]#
- probs() ClassificationOutputColumn #