meerkat.contrib package#

Subpackages#

Submodules#

meerkat.contrib.celeba module#

build_celeba_df(dataset_dir: str)[source]#: Build the dataframe by joining on the attribute, split and identity CelebA CSVs.

download_celeba(dataset_dir: str)[source]#

get_celeba(dataset_dir: str, download: bool = False)[source]#: Build the dataframe by joining on the attribute, split and identity CelebA CSVs.

meerkat.contrib.dew module#

build_dew_dp(dataset_dir: str, download: bool = True) → DataPanel[source]#

meerkat.contrib.imagenet module#

build_imagenet_dps(dataset_dir: str, download: bool = False) → Dict[str, DataPanel][source]#

meerkat.contrib.imagenette module#

build_imagenette_dp(dataset_dir: str, download: bool = False, version: str = '160px') → DataPanel[source]#

Build DataPanel for the Imagenette dataset.

Parameters:

download_dir (str) – The directory path to save to or load from.
version (str, optional) – Imagenette version. Choices: "full", "320px", "160px".
overwrite (bool, optional) – If True, redownload the datasets.

Returns:

A DataPanel corresponding to the dataset.

Return type:

mk.DataPanel

References

fastai/imagenette

download_imagenette(download_dir, version='160px', overwrite: bool = False, return_df: bool = False)[source]#

Download Imagenette dataset.

Parameters:

download_dir (str) – The directory path to save to.
version (str, optional) – Imagenette version. Choices: "full", "320px", "160px".
overwrite (bool, optional) – If True, redownload the dataset.
return_df (bool, optional) – If True, return a pd.DataFrame.

Returns:

If return_df=True, returns a pandas DataFrame.: Otherwise, returns the directory path where the data is stored.

Return type:

Union[str, pd.DataFrame]

References

fastai/imagenette

meerkat.contrib.registry module#

class Registry(name: str)[source]#

Bases: Registry

Extension of fvcore’s registry that supports aliases.

get(name: str, dataset_dir: str | None = None, download: bool = True, *args, **kwargs) → Any[source]#

register(obj: object | None = None, aliases: Sequence[str] | None = None) → object | None[source]#: Register the given object under the the name obj.__name__. Can be used as either a decorator or not. See docstring of this class for usage.

property catalog: DataPanel#

property names: List[str]#

celeba(dataset_dir: str | None = None, download: bool = True, **kwargs)[source]#

cifar10(dataset_dir: str | None = None, download: bool = True, **kwargs)[source]#: [summary]

dew(dataset_dir: str | None = None, download: bool = True, **kwargs) → DataPanel[source]#

Date Estimation in the Wild Dataset (DEW) [1]_

Columns:

image (ImageColumn):The image
img_id (SeriesColumn): Unique Flickr image id in the dataset.
GT (SeriesColumn): Ground truth acquisition year
date_taken (SeriesColumn): The time at which the photo has taken according to Flickr.
date_granularity (SeriesColumn): Accuracy to which we know the date to be accurate per Flickr https://www.flickr.com/services/api/misc.dates.html
url (SeriesColumn): Weblink for the image.
username (SeriesColumn): Flickr username of the author
title (SeriesColumn): Image title on Flickr
licence (SeriesColumn): Image license according to Flickr
licence_url (SeriesColumn): Weblink for the license (if available)

[1] Müller, Eric; Springstein, Matthias; Ewerth, Ralph (2017): Date Estimation in the Wild Dataset. Müller, Eric; Springstein, Matthias; Ewerth, Ralph. DOI: 10.22000/43

enron(dataset_dir: str | None = None, download: bool = True, **kwargs)[source]#

imagenet(dataset_dir: str | None = None, download: bool = True, **kwargs)[source]#

imagenette(dataset_dir: str | None = None, download: bool = True, **kwargs)[source]#

inaturalist(dataset_dir: str | None = None, download: bool = True, **kwargs) → DataPanel[source]#

iNaturalist 2021 Dataset [1]_

Columns:

image (ImageColumn): The image
image_id (SeriesColumn): Unique image id
date (SeriesColumn): The time at which the photo has taken.
latitude (SeriesColumn): Latitude at which the photo was taken
longitude (SeriesColumn): Longitude at which the photo was taken
location_uncertainty (SeriesColumn): Uncertainty in the location
license (SeriesColumn): License of the photo
rights_holder (SeriesColumn): Rights holder of the photo
width (SeriesColumn): Width of the image
height (SeriesColumn): Height of the image
file_name (SeriesColumn): Filepath relative to dataset_dir where the image is stored.

[1] visipedia/inat_comp

waterbirds(dataset_dir: str | None = None, download: bool = True, **kwargs)[source]#

yesno(dataset_dir: str | None = None, download: bool = True, **kwargs)[source]#

meerkat.contrib.siim_cxr module#

cxr_transform(volume: MedicalVolumeCell)[source]#

cxr_transform_pil(volume: MedicalVolumeCell)[source]#

download_siim_cxr(dataset_dir: str, kaggle_username: str, kaggle_key: str, download_gaze_data: bool = True, include_mock_reports: bool = True)[source]#

Download the dataset from the SIIM-ACR Pneumothorax Segmentation challenge. https://www.kaggle.com/c/siim-acr-pneumothorax- segmentation/data.

Parameters:

dataset_dir (str) – Path to directory where the dataset will be downloaded.
kaggle_username (str) – Your kaggle username.
kaggle_key (str) – A kaggle API key. In order to use the Kaggle’s public API, you must first authenticate using an API token. From the site header, click on your user profile picture, then on “My Account” from the dropdown menu. This will take you to your account settings at https://www.kaggle.com/account. Scroll down to the section of the page labelled API: To create a new token, click on the “Create New API Token” button. This will download a json file with a “username” and “key” field. Copy and paste the “key” field and pass it in as kaggle_key. Instructions copied from Kaggle API docs: https://www.kaggle.com/docs/api
download_gaze_data (str) – Download a pkl file containing eye-tracking data collected on a radiologist interpreting the xray.

meerkat.contrib.visual_genome module#

build_visual_genome_dps(dataset_dir: str, write: bool = False) → Dict[str, DataPanel][source]#

read_visual_genome_dps(dataset_dir: str) → Dict[str, DataPanel][source]#

write_visual_genome_dps(dps: Mapping[str, DataPanel], dataset_dir: str)[source]#