meerkat.ml package

Submodules

meerkat.ml.activation module

meerkat.ml.callbacks module

meerkat.ml.embedding_column module

class EmbeddingColumn(data: Optional[Sequence] = None, *args, **kwargs)[source]

Bases: TensorColumn

build_faiss_index(index=None, overwrite=False)[source]

pca(n_components=2)[source]

search(query, k: int)[source]

umap(n_neighbors=15, n_components=2)[source]

visualize_umap(n_neighbors=15, n_components=2, point_size=4)[source]

meerkat.ml.huggingfacemodel module

meerkat.ml.instances_column module

meerkat.ml.metrics module

accuracy(predictions: Union[list, array, Tensor], labels: Union[list, array, Tensor])[source]: Calculate accuracy.

class_distribution(labels: Union[list, array, Tensor], num_classes: Optional[int] = None, min_label: int = 0)[source]: Calculate the aggregated class distribution.

compute_metric(metric: str, predictions: Union[Sequence, Tensor], labels: Union[Sequence, Tensor], num_classes: int) → Union[float, ndarray, Tensor][source]

Compute metric given predictions and target labels.

Parameters

metric (str) – name of metric
predictions (Union[Sequence, torch.Tensor]) – a sequence of predictions (rouge metrics) or a torch Tensor (other metrics) containing predictions
labels (Union[Sequence, torch.Tensor]) – a sequence of labels (rouge metrics) or a torch Tensor (other metrics) containing target labels
num_classes (int) – number of classes

Returns

the calculate metric value

dice(predictions: Union[list, array, Tensor], labels: Union[list, array, Tensor])[source]: Calculate Dice Score.

f1(predictions: Union[list, array, Tensor], labels: Union[list, array, Tensor])[source]: Calculate F1 score for binary classification.

f1_macro(predictions: Union[list, array, Tensor], labels: Union[list, array, Tensor])[source]: Calculate macro F1 score for multi-class classification.

f1_micro(predictions: Union[list, array, Tensor], labels: Union[list, array, Tensor])[source]: Calculate micro F1 score for multi-class classification.

format_summary(x: str) → str[source]: Format summary text for computing rouge.

get_metric(name: str) → Callable[source]: Get metrics from string names.

iou_score(predictions: Union[list, array, Tensor], labels: Union[list, array, Tensor], num_classes: Optional[int] = None)[source]: Calculate IoU.

meerkat.ml.model module

class Model(model: Module, is_classifier: Optional[bool] = None, task: Optional[str] = None, device: Optional[str] = None)[source]

Bases: Module

activation(dataset: DataPanel, target_module: str, input_columns: List[str], batch_size=32) → EmbeddingColumn[source]

An Operation that stores model activations in a new Embedding column.

Parameters

dataset (DataPanel) – the meerkat DataPanel containing the model inputs.
target_module (str) – the name of the submodule of model (i.e. an intermediate layer) that outputs the activations we’d like to extract. For nested submodules, specify a path separated by “.” (e.g. ActivationCachedOp(model, “block4.conv”)).
input_columns (str) – Column containing model inputs

classification(dataset: DataPanel, input_columns: List[str], batch_size: int = 32, num_classes: Optional[int] = None, multi_label: bool = False, one_hot: Optional[bool] = None, threshold=0.5) → DataPanel[source]

evaluate(dataset: DataPanel, target_column: List[str], pred_column: List[str], metrics: List[str], num_classes: Optional[int] = None)[source]

forward(input_batch: Dict) → Dict[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

static remap_labels(output_dict: Dict, label_map: List[int]) → Dict[source]

Map the output labels of the model.

Example: 3-way classificaiton, with label_map = [1, 2, 0] => (model label 0 -> dataset label 1, model label 1 -> dataset label 2, …).

training: bool

meerkat.ml.prediction_column module

class ClassificationOutputColumn(logits: Optional[Union[Sequence, ndarray, Series, Tensor]] = None, probs: Optional[Union[Sequence, ndarray, Series, Tensor]] = None, preds: Optional[Union[Sequence, ndarray, Series, Tensor]] = None, num_classes: Optional[int] = None, multi_label: bool = False, one_hot: Optional[bool] = None, threshold=0.5, *args, **kwargs)[source]

Bases: TensorColumn

bincount() → TensorColumn[source]

Compute the count (cardinality) for each category.

Categories which are not available will have a count of 0.

If self.multi_label=True, the bincount will include the total number of times the category is seen. If an example is marked as 2 categories, the bincount will increase the count for both categories. Note, this means the sum of the number of classes can be more than the number of examples N.

Returns: A 1D tensor of length self.num_classes.
Return type: torch.Tensor

entropy() → TensorColumn[source]

Compute the entropy for each example.

If self.multi_label is True, each category is treated as a binary classification problem. There will be an entropy calculation for each category as well. For example, if the probabilities are of shape (N, C), there will be NxC entropy values.

In the multi-dimensional case, this returns the entropy for each element. For example, if the probabilities are of shape (N, C, A, B), there will be NxAxB entropy values.

Returns: Tensor of entropies
Return type: TensorColumn

logits() → ClassificationOutputColumn[source]

mode()[source]

predictions() → ClassificationOutputColumn[source]: Compute predictions.

preds() → ClassificationOutputColumn: Compute predictions.

probabilities() → ClassificationOutputColumn[source]

probs() → ClassificationOutputColumn