Lambda Columns and Lazy Selection#

Lambda Columns#

If you check out the implementation of ImageColumn, you’ll notice that it’s a super simple subclass of LambdaColumn.

What’s a LambdaColumn? In Meerkat, high-dimensional data types like images and videos are typically stored in a LambdaColumn. A LambdaColumn wraps around another column and applies a function to it’s content as it is indexed.

Consider the following example, where we create a simple Meerkat column…

In [1]: import meerkat as mk

In [2]: col = mk.NumpyArrayColumn([0,1,2])

In [3]: col[1]
Out[3]: 1

…and wrap it in a lambda column.

In [4]: lambda_col = col.to_lambda(fn=lambda x: x + 10)

In [5]: lambda_col[1]  # the function is only called at this point!
Out[5]: 11

Critically, the function inside a lambda column is only called at the time the column is indexed! This is very useful for columns with large data types that we don’t want to load all into memory at once. For example, we could create a LambdaColumn that lazily loads images…

In [6]: filepath_col = mk.PandasSeriesColumn(["path/to/image0.jpg", ...])

In [7]: img_col = filepath_col.to_lambda(fn=load_image)

An ImageColumn is a just a LambdaColumn like this one, with a few more bells and whistles!

Lazy Selection#

Todo

Fill in this stub.