meerkat.ops package
Submodules
meerkat.ops.concat module
- concat(objs: Union[Sequence[meerkat.datapanel.DataPanel], Sequence[meerkat.columns.abstract.AbstractColumn]], axis: Union[str, int] = 'rows', suffixes: Tuple[str] = None, overwrite: bool = False) Union[meerkat.datapanel.DataPanel, meerkat.columns.abstract.AbstractColumn][source]
Concatenate a sequence of columns or a sequence of DataPanel`s. If sequence is empty, returns an empty `DataPanel.
If concatenating columns, all columns must be of the same type (e.g. all
ListColumn). - If concatenating `DataPanel`s along axis 0 (rows), all `DataPanel`s must have the same set of columns. - If concatenating `DataPanel`s along axis 1 (columns), all `DataPanel`s must have the same length and cannot have any of the same column names.
- Parameters
objs (Union[Sequence[DataPanel], Sequence[AbstractColumn]]) – sequence of columns or DataPanels.
axis (Union[str, int]) – The axis along which to concatenate. Ignored if concatenating columns.
- Returns
concatenated DataPanel or column
- Return type
Union[DataPanel, AbstractColumn]
meerkat.ops.groupby module
- class GroupBy(data: meerkat.datapanel.DataPanel, indices: Dict[Union[str, Tuple[str]], numpy.ndarray], by: Union[List[str], str])[source]
Bases:
object
- groupby(data: meerkat.datapanel.DataPanel, by: Optional[Union[str, Sequence[str]]] = None) meerkat.ops.groupby.GroupBy[source]
Perform a groupby operation on a DataPanel or Column (similar to a DataFrame.groupby and Series.groupby operations in Pandas).
TODO (Sam): I put down a very rough scaffolding of how you could setup the class hierarchy for this. It is inspired by the way pandas has things setup: check out https://github.com/pandas-dev/pandas/tree/a8968bfa696d51f73769c54f2630a9530488236a/pandas/core/groupby for some inspiration.
I’d recommend starting with small simple datapanels. e.g. a datapanel of all numpy array columns. For example, ``` dp = DataPanel({
‘a’: NumpyArrayColumn([1, 2, 2, 1, 3, 2, 3]), ‘b’: NumpyArrayColumn([1, 2, 3, 4, 5, 6, 7]), ‘c’: NumpyArrayColumn([1.0, 3.2, 2.1, 4.3, 5.4, 6.5, 7.6])
})
groupby(dp, by=”a”)[“c”].mean() ```
Eventually we’ll want to support a bunch of different aggregations, but for the time being let’s just focus on mean, sum, and count.
Note: we’ll also want to implement methods DataPanel.groupby or AbstractColumn.groupby eventually, but we also want a functional version
that could be called like mk.groupby(dp, by=”class”). I’d suggest putting most of the implementation here,
and then making the methods just wrappers. See merge as an example.
- Parameters
data (Union[DataPanel, AbstractColumn]) – The data to group.
by (Union[str, Sequence[str]]) – The column(s) to group by. Ignored if
datais a Column.
- Returns
A GroupBy object.
- Return type
Union[DataPanelGroupBy, AbstractColumnGroupBy]
meerkat.ops.merge module
- merge(left: meerkat.datapanel.DataPanel, right: meerkat.datapanel.DataPanel, how: str = 'inner', on: Union[str, List[str]] = None, left_on: Union[str, List[str]] = None, right_on: Union[str, List[str]] = None, sort: bool = False, suffixes: Sequence[str] = ('_x', '_y'), validate=None)[source]