meerkat.contrib package#

Subpackages#

Submodules#

meerkat.contrib.celeba module#

build_celeba_df(dataset_dir: str)[source]#

Build the dataframe by joining on the attribute, split and identity CelebA CSVs.

download_celeba(dataset_dir: str)[source]#
get_celeba(dataset_dir: str, download: bool = False)[source]#

Build the dataframe by joining on the attribute, split and identity CelebA CSVs.

meerkat.contrib.dew module#

build_dew_dp(dataset_dir: str, download: bool = True) DataPanel[source]#

meerkat.contrib.imagenet module#

build_imagenet_dps(dataset_dir: str, download: bool = False) Dict[str, DataPanel][source]#

meerkat.contrib.imagenette module#

build_imagenette_dp(dataset_dir: str, download: bool = False, version: str = '160px') DataPanel[source]#

Build DataPanel for the Imagenette dataset.

Parameters:
  • download_dir (str) – The directory path to save to or load from.

  • version (str, optional) – Imagenette version. Choices: "full", "320px", "160px".

  • overwrite (bool, optional) – If True, redownload the datasets.

Returns:

A DataPanel corresponding to the dataset.

Return type:

mk.DataPanel

References

fastai/imagenette

download_imagenette(download_dir, version='160px', overwrite: bool = False, return_df: bool = False)[source]#

Download Imagenette dataset.

Parameters:
  • download_dir (str) – The directory path to save to.

  • version (str, optional) – Imagenette version. Choices: "full", "320px", "160px".

  • overwrite (bool, optional) – If True, redownload the dataset.

  • return_df (bool, optional) – If True, return a pd.DataFrame.

Returns:

If return_df=True, returns a pandas DataFrame.

Otherwise, returns the directory path where the data is stored.

Return type:

Union[str, pd.DataFrame]

References

fastai/imagenette

meerkat.contrib.registry module#

class Registry(name: str)[source]#

Bases: Registry

Extension of fvcore’s registry that supports aliases.

get(name: str, dataset_dir: str | None = None, download: bool = True, *args, **kwargs) Any[source]#
register(obj: object | None = None, aliases: Sequence[str] | None = None) object | None[source]#

Register the given object under the the name obj.__name__. Can be used as either a decorator or not. See docstring of this class for usage.

property catalog: DataPanel#
property names: List[str]#
celeba(dataset_dir: str | None = None, download: bool = True, **kwargs)[source]#
cifar10(dataset_dir: str | None = None, download: bool = True, **kwargs)[source]#

[summary]

dew(dataset_dir: str | None = None, download: bool = True, **kwargs) DataPanel[source]#

Date Estimation in the Wild Dataset (DEW) [1]_

Columns:
  • image (ImageColumn):The image

  • img_id (SeriesColumn): Unique Flickr image id in the dataset.

  • GT (SeriesColumn): Ground truth acquisition year

  • date_taken (SeriesColumn): The time at which the photo has taken according to Flickr.

  • date_granularity (SeriesColumn): Accuracy to which we know the date to be accurate per Flickr https://www.flickr.com/services/api/misc.dates.html

  • url (SeriesColumn): Weblink for the image.

  • username (SeriesColumn): Flickr username of the author

  • title (SeriesColumn): Image title on Flickr

  • licence (SeriesColumn): Image license according to Flickr

  • licence_url (SeriesColumn): Weblink for the license (if available)

[1] Müller, Eric; Springstein, Matthias; Ewerth, Ralph (2017): Date Estimation in the Wild Dataset. Müller, Eric; Springstein, Matthias; Ewerth, Ralph. DOI: 10.22000/43

enron(dataset_dir: str | None = None, download: bool = True, **kwargs)[source]#
imagenet(dataset_dir: str | None = None, download: bool = True, **kwargs)[source]#
imagenette(dataset_dir: str | None = None, download: bool = True, **kwargs)[source]#
inaturalist(dataset_dir: str | None = None, download: bool = True, **kwargs) DataPanel[source]#

iNaturalist 2021 Dataset [1]_

Columns:
  • image (ImageColumn): The image

  • image_id (SeriesColumn): Unique image id

  • date (SeriesColumn): The time at which the photo has taken.

  • latitude (SeriesColumn): Latitude at which the photo was taken

  • longitude (SeriesColumn): Longitude at which the photo was taken

  • location_uncertainty (SeriesColumn): Uncertainty in the location

  • license (SeriesColumn): License of the photo

  • rights_holder (SeriesColumn): Rights holder of the photo

  • width (SeriesColumn): Width of the image

  • height (SeriesColumn): Height of the image

  • file_name (SeriesColumn): Filepath relative to dataset_dir where the image is stored.

[1] visipedia/inat_comp

waterbirds(dataset_dir: str | None = None, download: bool = True, **kwargs)[source]#
yesno(dataset_dir: str | None = None, download: bool = True, **kwargs)[source]#

meerkat.contrib.siim_cxr module#

cxr_transform(volume: MedicalVolumeCell)[source]#
cxr_transform_pil(volume: MedicalVolumeCell)[source]#
download_siim_cxr(dataset_dir: str, kaggle_username: str, kaggle_key: str, download_gaze_data: bool = True, include_mock_reports: bool = True)[source]#

Download the dataset from the SIIM-ACR Pneumothorax Segmentation challenge. https://www.kaggle.com/c/siim-acr-pneumothorax- segmentation/data.

Parameters:
  • dataset_dir (str) – Path to directory where the dataset will be downloaded.

  • kaggle_username (str) – Your kaggle username.

  • kaggle_key (str) – A kaggle API key. In order to use the Kaggle’s public API, you must first authenticate using an API token. From the site header, click on your user profile picture, then on “My Account” from the dropdown menu. This will take you to your account settings at https://www.kaggle.com/account. Scroll down to the section of the page labelled API: To create a new token, click on the “Create New API Token” button. This will download a json file with a “username” and “key” field. Copy and paste the “key” field and pass it in as kaggle_key. Instructions copied from Kaggle API docs: https://www.kaggle.com/docs/api

  • download_gaze_data (str) – Download a pkl file containing eye-tracking data collected on a radiologist interpreting the xray.

meerkat.contrib.visual_genome module#

build_visual_genome_dps(dataset_dir: str, write: bool = False) Dict[str, DataPanel][source]#
read_visual_genome_dps(dataset_dir: str) Dict[str, DataPanel][source]#
write_visual_genome_dps(dps: Mapping[str, DataPanel], dataset_dir: str)[source]#