mapped_array module ¶

Base class for working with mapped arrays.

This class takes the mapped array and the corresponding column and (optionally) index arrays, and offers features to directly process the mapped array without converting it to pandas; for example, to compute various statistics by column, such as standard deviation.

Consider the following example:

>>> import numpy as np
>>> import pandas as pd
>>> from numba import njit
>>> import vectorbt as vbt

>>> a = np.array([10., 11., 12., 13., 14., 15., 16., 17., 18.])
>>> col_arr = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])
>>> idx_arr = np.array([0, 1, 2, 0, 1, 2, 0, 1, 2])
>>> wrapper = vbt.ArrayWrapper(index=['x', 'y', 'z'],
...     columns=['a', 'b', 'c'], ndim=2, freq='1 day')
>>> ma = vbt.MappedArray(wrapper, a, col_arr, idx_arr=idx_arr)

Reducing¶

Using MappedArray, you can then reduce by column as follows:

Use already provided reducers such as MappedArray.mean():

>>> ma.mean()
a    11.0
b    14.0
c    17.0
dtype: float64

Use MappedArray.to_pd() to map to pandas and then reduce manually (expensive):

>>> ma.to_pd().mean()
a    11.0
b    14.0
c    17.0
dtype: float64

Use MappedArray.reduce() to reduce using a custom function:

>>> @njit
... def pow_mean_reduce_nb(col, a, pow):
...     return np.mean(a ** pow)

>>> ma.reduce(pow_mean_reduce_nb, 2)
a    121.666667
b    196.666667
c    289.666667
dtype: float64

>>> @njit
... def min_max_reduce_nb(col, a):
...     return np.array([np.min(a), np.max(a)])

>>> ma.reduce(min_max_reduce_nb, returns_array=True, index=['min', 'max'])
        a     b     c
min  10.0  13.0  16.0
max  12.0  15.0  18.0

>>> @njit
... def idxmin_idxmax_reduce_nb(col, a):
...     return np.array([np.argmin(a), np.argmax(a)])

>>> ma.reduce(idxmin_idxmax_reduce_nb, returns_array=True,
...     returns_idx=True, index=['idxmin', 'idxmax'])
        a  b  c
idxmin  x  x  x
idxmax  z  z  z

Mapping¶

Use MappedArray.apply() to apply a function on each column/group:

>>> @njit
... def cumsum_apply_nb(idxs, col, a):
...     return np.cumsum(a)

>>> ma.apply(cumsum_apply_nb)
<vectorbt.records.mapped_array.MappedArray at 0x7ff061382198>

>>> ma.apply(cumsum_apply_nb).values
array([10., 21., 33., 13., 27., 42., 16., 33., 51.])

>>> group_by = np.array(['first', 'first', 'second'])
>>> ma.apply(cumsum_apply_nb, group_by=group_by, apply_per_group=True).values
array([10., 21., 33., 46., 60., 75., 16., 33., 51.])

Notice how cumsum resets at each column in the first example and at each group in the second example.

## Conversion

You can expand any `MappedArray` instance to pandas:

* Given `idx_arr` was provided:

```pycon
>>> ma.to_pd()
      a     b     c
x  10.0  13.0  16.0
y  11.0  14.0  17.0
z  12.0  15.0  18.0

Note

Will raise an error if there are multiple values pointing to the same position.

In case group_by was provided, index can be ignored, or there are position conflicts:

>>> ma.to_pd(group_by=np.array(['first', 'first', 'second']), ignore_index=True)
   first  second
0   10.0    16.0
1   11.0    17.0
2   12.0    18.0
3   13.0     NaN
4   14.0     NaN
5   15.0     NaN

Filtering¶

Use MappedArray.apply_mask() to filter elements per column/group:

>>> mask = [True, False, True, False, True, False, True, False, True]
>>> filtered_ma = ma.apply_mask(mask)
>>> filtered_ma.count()
a    2
b    1
c    2
dtype: int64

>>> filtered_ma.id_arr
array([0, 2, 4, 6, 8])

Plotting¶

You can build histograms and boxplots of MappedArray directly:

>>> ma.boxplot()

To use scatterplots or any other plots that require index, convert to pandas first:

>>> ma.to_pd().vbt.plot()

Grouping¶

One of the key features of MappedArray is that you can perform reducing operations on a group of columns as if they were a single column. Groups can be specified by group_by, which can be anything from positions or names of column levels, to a NumPy array with actual groups.

There are multiple ways of define grouping:

When creating MappedArray, pass group_by to ArrayWrapper:

>>> group_by = np.array(['first', 'first', 'second'])
>>> grouped_wrapper = wrapper.replace(group_by=group_by)
>>> grouped_ma = vbt.MappedArray(grouped_wrapper, a, col_arr, idx_arr=idx_arr)

>>> grouped_ma.mean()
first     12.5
second    17.0
dtype: float64

Regroup an existing MappedArray:

>>> ma.regroup(group_by).mean()
first     12.5
second    17.0
dtype: float64

Pass group_by directly to the reducing method:

>>> ma.mean(group_by=group_by)
first     12.5
second    17.0
dtype: float64

By the same way you can disable or modify any existing grouping:

>>> grouped_ma.mean(group_by=False)
a    11.0
b    14.0
c    17.0
dtype: float64

Note

Grouping applies only to reducing operations, there is no change to the arrays.

Operators¶

MappedArray implements arithmetic, comparison and logical operators. You can perform basic operations (such as addition) on mapped arrays as if they were NumPy arrays.

>>> ma ** 2
<vectorbt.records.mapped_array.MappedArray at 0x7f97bfc49358>

>>> ma * np.array([1, 2, 3, 4, 5, 6])
<vectorbt.records.mapped_array.MappedArray at 0x7f97bfc65e80>

>>> ma + ma
<vectorbt.records.mapped_array.MappedArray at 0x7fd638004d30>

Note

You should ensure that your MappedArray operand is on the left if the other operand is an array.

If two MappedArray operands have different metadata, will copy metadata from the first one, but at least their id_arr and col_arr must match.

Indexing¶

Like any other class subclassing Wrapping, we can do pandas indexing on a MappedArray instance, which forwards indexing operation to each object with columns:

>>> ma['a'].values
array([10., 11., 12.])

>>> grouped_ma['first'].values
array([10., 11., 12., 13., 14., 15.])

Note

Changing index (time axis) is not supported. The object should be treated as a Series rather than a DataFrame; for example, use some_field.iloc[0] instead of some_field.iloc[:, 0].

Indexing behavior depends solely upon ArrayWrapper. For example, if group_select is enabled indexing will be performed on groups, otherwise on single columns.

Caching¶

MappedArray supports caching. If a method or a property requires heavy computation, it's wrapped with cached_method() and cached_property respectively. Caching can be disabled globally via caching in settings.

Note

Because of caching, class is meant to be immutable and all properties are read-only. To change any attribute, use the copy method and pass the attribute as keyword argument.

Saving and loading¶

Like any other class subclassing Pickleable, we can save a MappedArray instance to the disk with Pickleable.save() and load it with Pickleable.load().

Stats¶

Hint

See StatsBuilderMixin.stats() and MappedArray.metrics.

Metric for mapped arrays are similar to that for GenericAccessor.

>>> ma.stats(column='a')
Start                      x
End                        z
Period       3 days 00:00:00
Count                      3
Mean                    11.0
Std                      1.0
Min                     10.0
Median                  11.0
Max                     12.0
Min Index                  x
Max Index                  z
Name: a, dtype: object

The main difference unfolds once the mapped array has a mapping: values are then considered as categorical and usual statistics are meaningless to compute. For this case, StatsBuilderMixin.stats() returns the value counts:

>>> mapping = {v: "test_" + str(v) for v in np.unique(ma.values)}
>>> ma.stats(column='a', settings=dict(mapping=mapping))
Start                                    x
End                                      z
Period                     3 days 00:00:00
Count                                    3
Value Counts: test_10.0                  1
Value Counts: test_11.0                  1
Value Counts: test_12.0                  1
Value Counts: test_13.0                  0
Value Counts: test_14.0                  0
Value Counts: test_15.0                  0
Value Counts: test_16.0                  0
Value Counts: test_17.0                  0
Value Counts: test_18.0                  0
Name: a, dtype: object

`MappedArray.stats` also supports (re-)grouping:

```pycon
>>> grouped_ma.stats(column='first')
Start                      x
End                        z
Period       3 days 00:00:00
Count                      6
Mean                    12.5
Std                 1.870829
Min                     10.0
Median                  12.5
Max                     15.0
Min Index                  x
Max Index                  z
Name: first, dtype: object

Plots¶

Hint

See PlotsBuilderMixin.plots() and MappedArray.subplots.

MappedArray class has a single subplot based on MappedArray.to_pd() and GenericAccessor.plot():

>>> ma.plots()

combine_mapped_with_other function ¶

combine_mapped_with_other(
    other,
    np_func
)

Combine MappedArray with other compatible object.

If other object is also MappedArray, their id_arr and col_arr must match.

MappedArray class ¶

MappedArray(
    wrapper,
    mapped_arr,
    col_arr,
    id_arr=None,
    idx_arr=None,
    mapping=None,
    col_mapper=None,
    **kwargs
)

Exposes methods for reducing, converting, and plotting arrays mapped by Records class.

Args

wrapper : ArrayWrapper

Array wrapper.

See ArrayWrapper.

mapped_arr : array_like

A one-dimensional array of mapped record values.

col_arr : array_like

A one-dimensional column array.

Must be of the same size as mapped_arr.

id_arr : array_like

A one-dimensional id array. Defaults to simple range.

Must be of the same size as mapped_arr.

idx_arr : array_like

A one-dimensional index array. Optional.

Must be of the same size as mapped_arr.

mapping : namedtuple, dict or callable

Mapping.

col_mapper : ColumnMapper

Column mapper if already known.

Note

It depends upon wrapper and col_arr, so make sure to invalidate col_mapper upon creating a MappedArray instance with a modified wrapper or `col_arr.

MappedArray.replace() does it automatically.

**kwargs

Custom keyword arguments passed to the config.

Useful if any subclass wants to extend the config.

Superclasses

Inherited members

apply method ¶

MappedArray.apply(
    apply_func_nb,
    *args,
    group_by=None,
    apply_per_group=False,
    dtype=None,
    **kwargs
)

Apply function on mapped array per column/group. Returns mapped array.

Applies per group if apply_per_group is True.

See apply_on_mapped_nb().

**kwargs are passed to MappedArray.replace().

apply_mapping method ¶

MappedArray.apply_mapping(
    mapping=None,
    **kwargs
)

Apply mapping on each element.

apply_mask method ¶

MappedArray.apply_mask(
    mask,
    idx_arr=None,
    group_by=None,
    **kwargs
)

Return a new class instance, filtered by mask.

**kwargs are passed to MappedArray.replace().

bottom_n method ¶

MappedArray.bottom_n(
    n,
    **kwargs
)

Filter bottom N elements from each column/group.

bottom_n_mask method ¶

MappedArray.bottom_n_mask(
    n,
    **kwargs
)

Return mask of bottom N elements in each column/group.

boxplot method ¶

MappedArray.boxplot(
    group_by=None,
    **kwargs
)

Plot box plot by column/group.

col_arr property ¶

Column array.

col_mapper property ¶

Column mapper.

See ColumnMapper.

count method ¶

MappedArray.count(
    group_by=None,
    wrap_kwargs=None
)

Return number of values by column/group.

describe method ¶

MappedArray.describe(
    percentiles=None,
    ddof=1,
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return statistics by column/group.

histplot method ¶

MappedArray.histplot(
    group_by=None,
    **kwargs
)

Plot histogram by column/group.

id_arr property ¶

Id array.

idx_arr property ¶

Index array.

idxmax method ¶

MappedArray.idxmax(
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return index of max by column/group.

idxmin method ¶

MappedArray.idxmin(
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return index of min by column/group.

indexing_func method ¶

MappedArray.indexing_func(
    pd_indexing_func,
    **kwargs
)

Perform indexing on MappedArray.

indexing_func_meta method ¶

MappedArray.indexing_func_meta(
    pd_indexing_func,
    **kwargs
)

Perform indexing on MappedArray and return metadata.

is_expandable method ¶

MappedArray.is_expandable(
    idx_arr=None,
    group_by=None
)

See is_mapped_expandable_nb().

is_sorted method ¶

MappedArray.is_sorted(
    incl_id=False
)

Check whether mapped array is sorted.

map_to_mask method ¶

MappedArray.map_to_mask(
    inout_map_func_nb,
    *args,
    group_by=None
)

Map mapped array to a mask.

See mapped_to_mask_nb().

mapped_arr property ¶

Mapped array.

mapping property ¶

Mapping.

max method ¶

MappedArray.max(
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return max by column/group.

mean method ¶

MappedArray.mean(
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return mean by column/group.

median method ¶

MappedArray.median(
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return median by column/group.

metrics class variable ¶

Metrics supported by MappedArray.

Config({
    "start": {
        "title": "Start",
        "calc_func": "<function MappedArray.<lambda> at 0x119964b80>",
        "agg_func": null,
        "tags": "wrapper"
    },
    "end": {
        "title": "End",
        "calc_func": "<function MappedArray.<lambda> at 0x119964c20>",
        "agg_func": null,
        "tags": "wrapper"
    },
    "period": {
        "title": "Period",
        "calc_func": "<function MappedArray.<lambda> at 0x119964cc0>",
        "apply_to_timedelta": true,
        "agg_func": null,
        "tags": "wrapper"
    },
    "count": {
        "title": "Count",
        "calc_func": "count",
        "tags": "mapped_array"
    },
    "mean": {
        "title": "Mean",
        "calc_func": "mean",
        "inv_check_has_mapping": true,
        "tags": [
            "mapped_array",
            "describe"
        ]
    },
    "std": {
        "title": "Std",
        "calc_func": "std",
        "inv_check_has_mapping": true,
        "tags": [
            "mapped_array",
            "describe"
        ]
    },
    "min": {
        "title": "Min",
        "calc_func": "min",
        "inv_check_has_mapping": true,
        "tags": [
            "mapped_array",
            "describe"
        ]
    },
    "median": {
        "title": "Median",
        "calc_func": "median",
        "inv_check_has_mapping": true,
        "tags": [
            "mapped_array",
            "describe"
        ]
    },
    "max": {
        "title": "Max",
        "calc_func": "max",
        "inv_check_has_mapping": true,
        "tags": [
            "mapped_array",
            "describe"
        ]
    },
    "idx_min": {
        "title": "Min Index",
        "calc_func": "idxmin",
        "inv_check_has_mapping": true,
        "agg_func": null,
        "tags": [
            "mapped_array",
            "index"
        ]
    },
    "idx_max": {
        "title": "Max Index",
        "calc_func": "idxmax",
        "inv_check_has_mapping": true,
        "agg_func": null,
        "tags": [
            "mapped_array",
            "index"
        ]
    },
    "value_counts": {
        "title": "Value Counts",
        "calc_func": "<function MappedArray.<lambda> at 0x119964d60>",
        "resolve_value_counts": true,
        "check_has_mapping": true,
        "tags": [
            "mapped_array",
            "value_counts"
        ]
    }
})

Returns MappedArray._metrics, which gets (deep) copied upon creation of each instance. Thus, changing this config won't affect the class.

To change metrics, you can either change the config in-place, override this property, or overwrite the instance variable MappedArray._metrics.

min method ¶

MappedArray.min(
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return min by column/group.

nth method ¶

MappedArray.nth(
    n,
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return n-th element of each column/group.

nth_index method ¶

MappedArray.nth_index(
    n,
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return index of n-th element of each column/group.

plots_defaults property ¶

Defaults for PlotsBuilderMixin.plots().

Merges PlotsBuilderMixin.plots_defaults and mapped_array.plots from settings.

reduce method ¶

MappedArray.reduce(
    reduce_func_nb,
    *args,
    idx_arr=None,
    returns_array=False,
    returns_idx=False,
    to_index=True,
    fill_value=nan,
    group_by=None,
    wrap_kwargs=None
)

Reduce mapped array by column/group.

If returns_array is False and returns_idx is False, see reduce_mapped_nb(). If returns_array is False and returns_idx is True, see reduce_mapped_to_idx_nb(). If returns_array is True and returns_idx is False, see reduce_mapped_to_array_nb(). If returns_array is True and returns_idx is True, see reduce_mapped_to_idx_array_nb().

If returns_idx is True, must pass idx_arr. Set to_index to False to return raw positions instead of labels. Use fill_value to set the default value. Set group_by to False to disable grouping.

replace method ¶

MappedArray.replace(
    **kwargs
)

See Configured.replace().

Also, makes sure that MappedArray.col_mapper is not passed to the new instance.

sort method ¶

MappedArray.sort(
    incl_id=False,
    idx_arr=None,
    group_by=None,
    **kwargs
)

Sort mapped array by column array (primary) and id array (secondary, optional).

**kwargs are passed to MappedArray.replace().

stats_defaults property ¶

Defaults for StatsBuilderMixin.stats().

Merges StatsBuilderMixin.stats_defaults and mapped_array.stats from settings.

std method ¶

MappedArray.std(
    ddof=1,
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return std by column/group.

subplots class variable ¶

Subplots supported by MappedArray.

Config({
    "to_pd_plot": {
        "check_is_not_grouped": true,
        "plot_func": "to_pd.vbt.plot",
        "pass_trace_names": false,
        "tags": "mapped_array"
    }
})

Returns MappedArray._subplots, which gets (deep) copied upon creation of each instance. Thus, changing this config won't affect the class.

To change subplots, you can either change the config in-place, override this property, or overwrite the instance variable MappedArray._subplots.

sum method ¶

MappedArray.sum(
    fill_value=0.0,
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return sum by column/group.

to_index method ¶

MappedArray.to_index()

Convert to index.

to_pd method ¶

MappedArray.to_pd(
    idx_arr=None,
    ignore_index=False,
    fill_value=nan,
    group_by=None,
    wrap_kwargs=None
)

Expand mapped array to a Series/DataFrame.

If ignore_index, will ignore the index and stack data points on top of each other in every column/group (see stack_expand_mapped_nb()). Otherwise, see expand_mapped_nb().

Note

Will raise an error if there are multiple values pointing to the same position. Set ignore_index to True in this case.

Warning

Mapped arrays represent information in the most memory-friendly format. Mapping back to pandas may occupy lots of memory if records are sparse.

top_n method ¶

MappedArray.top_n(
    n,
    **kwargs
)

Filter top N elements from each column/group.

top_n_mask method ¶

MappedArray.top_n_mask(
    n,
    **kwargs
)

Return mask of top N elements in each column/group.

value_counts method ¶

MappedArray.value_counts(
    normalize=False,
    sort_uniques=True,
    sort=False,
    ascending=False,
    dropna=False,
    group_by=None,
    mapping=None,
    incl_all_keys=False,
    wrap_kwargs=None,
    **kwargs
)

See GenericAccessor.value_counts().

Note

Does not take into account missing values.

values property ¶

Mapped array.

MetaMappedArray class ¶

MetaMappedArray(
    *args,
    **kwargs
)

Meta class that exposes a read-only class property StatsBuilderMixin.metrics.

Superclasses

Inherited members

mapped_array module¶

Reducing¶

Mapping¶

Filtering¶

Plotting¶

Grouping¶

Operators¶

Indexing¶

Caching¶

Saving and loading¶

Stats¶

Plots¶

combine_mapped_with_other function¶

MappedArray class¶

apply method¶

apply_mapping method¶

apply_mask method¶

bottom_n method¶

bottom_n_mask method¶

boxplot method¶

col_arr property¶

col_mapper property¶

count method¶

describe method¶

histplot method¶

id_arr property¶

idx_arr property¶

idxmax method¶

idxmin method¶

indexing_func method¶

indexing_func_meta method¶

is_expandable method¶

is_sorted method¶

map_to_mask method¶

mapped_arr property¶

mapping property¶

max method¶

mean method¶

median method¶

metrics class variable¶

min method¶

nth method¶

nth_index method¶

plots_defaults property¶

reduce method¶

replace method¶

sort method¶

stats_defaults property¶

std method¶

subplots class variable¶

sum method¶

to_index method¶

to_pd method¶

top_n method¶

top_n_mask method¶

value_counts method¶

values property¶

MetaMappedArray class¶

mapped_array module ¶

combine_mapped_with_other function ¶

MappedArray class ¶

apply method ¶

apply_mapping method ¶

apply_mask method ¶

bottom_n method ¶

bottom_n_mask method ¶

boxplot method ¶

col_arr property ¶

col_mapper property ¶

count method ¶

describe method ¶

histplot method ¶

id_arr property ¶

idx_arr property ¶

idxmax method ¶

idxmin method ¶

indexing_func method ¶

indexing_func_meta method ¶

is_expandable method ¶

is_sorted method ¶

map_to_mask method ¶

mapped_arr property ¶

mapping property ¶

max method ¶

mean method ¶

median method ¶

metrics class variable ¶

min method ¶

nth method ¶

nth_index method ¶

plots_defaults property ¶

reduce method ¶

replace method ¶

sort method ¶

stats_defaults property ¶

std method ¶

subplots class variable ¶

sum method ¶

to_index method ¶

to_pd method ¶

top_n method ¶

top_n_mask method ¶

value_counts method ¶

values property ¶

MetaMappedArray class ¶