meerkat.contrib.wilds package#

WILDS integration for Meerkat.

class WILDSInputColumn(dataset_name: str = 'fmow', version: str | None = None, root_dir: str | None = None, split: str | None = None, use_transform: bool = True, **kwargs)[source]#

Bases: AbstractColumn

get_metadata_columns()[source]#
get_y_column()[source]#

Get a NumpyArrayColumn holding the targets for the dataset.

Warning: WildsDataset`s may remap indexes in arbitrary ways so it’s important not to directly try to access the underlying data structures, instead relying on the `y_array and metadata_array properties which are universal across WILDS datasets.

get_wilds_datapanel(dataset_name: str, root_dir: str, version: str | None = None, column_names: List[str] | None = None, info: DatasetInfo | None = None, split: str | None = None, use_transform: bool = True, include_raw_input: bool = True)[source]#

Get a DataPanel that holds a WildsInputColumn alongside NumpyColumns for targets and metadata.

Example: Run inference on the dataset and store predictions alongside the data. .. code-block:: python

dp = get_wilds_datapanel(“fmow”, root_dir=”/datasets/”, split=”test”) model = … # get the model model.to(0).eval()

@torch.no_grad() def predict(batch: dict):

out = torch.softmax(model(batch[“input”].to(0)), axis=-1) return {“pred”: out.cpu().numpy().argmax(axis=-1)}

dp = dp.update(function=predict, batch_size=128, is_batched_fn=True)

Parameters:
  • dataset_name (str, optional) – dataset name. Defaults to “fmow”.

  • version (str, optional) – dataset version number, e.g., ‘1.0’. Defaults to the latest version.

  • root_dir (str) – the directory where the WILDS dataset is downloaded. See https://wilds.stanford.edu/ for download instructions.

  • split (str, optional) – see . Defaults to None.

  • use_transform (bool, optional) – Whether to apply the transform from the WILDS example directory on load. Defaults to True.

  • column_names (List[str], optional) – [description]. Defaults to None.

  • info (DatasetInfo, optional) – [description]. Defaults to None.

  • use_transform – [description]. Defaults to True.

  • include_raw_input (bool, optional) – include a column for the input without the transform applied – useful for visualizing images. Defaults to True.

Submodules#

meerkat.contrib.wilds.config module#

WILDS configuration defaults and operations.

All default configurations are integrated from the WILDS repository: p-lambda/wilds

populate_config(config, template: dict, force_compatibility=False)[source]#

Populates missing (key, val) pairs in config with (key, val) in template. Example usage: populate config with defaults :param - config: namespace :param - template: dict :param - force_compatibility: option to raise errors if config.key != template[key]

populate_defaults(config)[source]#

Populates hyperparameters with defaults implied by choices of other hyperparameters.

meerkat.contrib.wilds.transforms module#

getBertTokenizer(model)[source]#
initialize_bert_transform(config)[source]#
initialize_image_base_transform(config, dataset)[source]#
initialize_image_resize_and_center_crop_transform(config, dataset)[source]#

Resizes the image to a slightly larger square then crops the center.

initialize_poverty_train_transform()[source]#
initialize_transform(transform_name, config, dataset)[source]#