meerkat.ops package

Submodules

meerkat.ops.concat module

concat(objs: Union[Sequence[meerkat.datapanel.DataPanel], Sequence[meerkat.columns.abstract.AbstractColumn]], axis: Union[str, int] = 'rows', suffixes: Tuple[str] = None, overwrite: bool = False) Union[meerkat.datapanel.DataPanel, meerkat.columns.abstract.AbstractColumn][source]

Concatenate a sequence of columns or a sequence of DataPanel`s. If sequence is empty, returns an empty `DataPanel.

  • If concatenating columns, all columns must be of the same type (e.g. all

ListColumn). - If concatenating `DataPanel`s along axis 0 (rows), all `DataPanel`s must have the same set of columns. - If concatenating `DataPanel`s along axis 1 (columns), all `DataPanel`s must have the same length and cannot have any of the same column names.

Parameters
  • objs (Union[Sequence[DataPanel], Sequence[AbstractColumn]]) – sequence of columns or DataPanels.

  • axis (Union[str, int]) – The axis along which to concatenate. Ignored if concatenating columns.

Returns

concatenated DataPanel or column

Return type

Union[DataPanel, AbstractColumn]

meerkat.ops.groupby module

class GroupBy(data: meerkat.datapanel.DataPanel, indices: Dict[Union[str, Tuple[str]], numpy.ndarray], by: Union[List[str], str])[source]

Bases: object

mean(*args, **kwargs)[source]
groupby(data: meerkat.datapanel.DataPanel, by: Optional[Union[str, Sequence[str]]] = None) meerkat.ops.groupby.GroupBy[source]

Perform a groupby operation on a DataPanel or Column (similar to a DataFrame.groupby and Series.groupby operations in Pandas).

TODO (Sam): I put down a very rough scaffolding of how you could setup the class hierarchy for this. It is inspired by the way pandas has things setup: check out https://github.com/pandas-dev/pandas/tree/a8968bfa696d51f73769c54f2630a9530488236a/pandas/core/groupby for some inspiration.

I’d recommend starting with small simple datapanels. e.g. a datapanel of all numpy array columns. For example, ``` dp = DataPanel({

‘a’: NumpyArrayColumn([1, 2, 2, 1, 3, 2, 3]), ‘b’: NumpyArrayColumn([1, 2, 3, 4, 5, 6, 7]), ‘c’: NumpyArrayColumn([1.0, 3.2, 2.1, 4.3, 5.4, 6.5, 7.6])

})

groupby(dp, by=”a”)[“c”].mean() ```

Eventually we’ll want to support a bunch of different aggregations, but for the time being let’s just focus on mean, sum, and count.

Note: we’ll also want to implement methods DataPanel.groupby or AbstractColumn.groupby eventually, but we also want a functional version

that could be called like mk.groupby(dp, by=”class”). I’d suggest putting most of the implementation here,

and then making the methods just wrappers. See merge as an example.

Parameters
  • data (Union[DataPanel, AbstractColumn]) – The data to group.

  • by (Union[str, Sequence[str]]) – The column(s) to group by. Ignored if data is a Column.

Returns

A GroupBy object.

Return type

Union[DataPanelGroupBy, AbstractColumnGroupBy]

meerkat.ops.merge module

merge(left: meerkat.datapanel.DataPanel, right: meerkat.datapanel.DataPanel, how: str = 'inner', on: Union[str, List[str]] = None, left_on: Union[str, List[str]] = None, right_on: Union[str, List[str]] = None, sort: bool = False, suffixes: Sequence[str] = ('_x', '_y'), validate=None)[source]

Module contents