meerkat.contrib.video_corruptions package#

Submodules#

meerkat.contrib.video_corruptions.transforms module#

class TemporalCrop(n_clips: int, clip_length: int, time_dim: int | None = 1, clip_spacing: str | None = 'equal', padding_mode: str | None = 'loop', sample_starting_location: bool | None = False, stack_clips: bool | None = True)[source]#

Bases: object

Video transformation for performing “temporal cropping:” the sampling of a pre-defined number of clips, each with pre-defined length, from a full video. Can be used with Compose in torchvision.

When used with TemporalDownsampling, it is highly recommended to put TemporalCrop after TemporalDownsampling. Since TemporalCrop can change the number of dimensions in the output tensor, due to clip selection, it is in fact recommended to put this transform at the end of a video transformation pipeline.

Parameters:
  • n_clips (int) – the number of clips that should be sampled.

  • clip_length (int) – the length of each clip (in the number of frames)

  • time_dim (int) – the index of the time dimension of the video

  • clip_spacing (Optional; default "equal") – how to choose starting locations for sampling clips. Keyword “equal” means that clip starting locations are sampled from each 1/n_clips segment of the video. The other option, “anywhere”, places no restrictions on where clip starting locations can be sampled.

  • padding_mode – (Optional; default “loop”): behavior if a requested clip length would result a clip exceeding the end of the video. Keyword “loop” results in a wrap-around to the start of the video. The other option, “freeze”, repeats the final frame until the requested clip length is achieved.

  • sample_starting_location – (Optional; default False): whether to sample a starting location (usually used for training) for a clip. Can be used in tandem with “equal” during training to sample clips with random starting locations distributed across time. Redundant if clip_spacing is “anywhere”.

  • stack_clips – (Optional; default True): whether to stack clips in a new dimension (used in 3D action recognition backbones), or stack clips by concatenating across the time dimension (used in 2D action recognition backbones). Output shape if True is (n_clips, *video_shape). If False, the output shape has the same number of dimensions as the original video, but the time dimension is extended by a factor of n_clips.

Examples

# Create a VideoCell from “/path/to/video.mp4” with time in dimension one, sampling 10 clips each of length 16, sampling clips equally across the video >>> cell = VideoCell(“/path/to/video.mp4”,

time_dim=1, transform=TemporalCrop(10, 16, time_dim=1) )

# output shape: (10, n_channels, 16, H, W)

# Create a VideoCell from “/path/to/video.mp4” with time in dimension one, sampling 8 clips each of length 8, sampling clips from arbitrary video locations and freezing the last frame if a clip exceeds the video length >>> cell = VideoCell(“/path/to/video.mp4”,

time_dim=1, transform=TemporalCrop(8, 8, time_dim=1, clip_spacing=”anywhere”, padding_mode=”freeze”) )

# output shape: (8, n_channels, 8, H, W)

# Create a VideoCell from “/path/to/video.mp4” with time in dimension one, sampling one frame from each third of the video, concatenating the frames in the time dimension >>> cell = VideoCell(“/path/to/video.mp4”,

time_dim=1, transform=TemporalCrop(1, 3, time_dim=1, clip_spacing=”equal”,

sample_starting_location=True, stack_clips=False)

)

# output shape: (n_channels, 3, H, W)

Note that time_dim in the TemporalDownsampling call must match the the time_dim in the VideoCell constructor!

class TemporalDownsampling(downsample_factor: int, time_dim: int | None = 1)[source]#

Bases: object

Video transformation for performing temporal downsampling (i.e. reading in every Nth frame only). This can be used in tandem with VideoCell by passing it into the transform keyword in the constructor. Can be used with Compose in torchvision.

When using with TemporalCrop, it is highly recommended to put TemporalDownsampling first, with TemporalCrop second.

Parameters:
  • downsample_factor (int) – the factor by which the input video should be downsampled. Must be a strictly positive integer.

  • time_dim (int) – the time dimension of the input video.

Examples

# Create a VideoCell from “/path/to/video.mp4” with time in dimension one, showing every other frame >>> cell = VideoCell(“path/to/video.mp4”,

time_dim=1, transform=TemporalDownsampling(2, time_dim=1) )

Note that time_dim in the TemporalDownsampling call must match the the time_dim in the VideoCell constructor!

meerkat.contrib.video_corruptions.utils module#

class stderr_suppress[source]#

Bases: object

A context manager for doing a “deep suppression” of stdout and stderr in Python.

This is necessary when reading in a corrupted video, or else stderr will emit 10000s of errors via ffmpeg. Great for decoding IRL, not great for loading 100s of corrupted videos.