Skip to content

mapped_array module

Base class for working with mapped arrays.

This class takes the mapped array and the corresponding column and (optionally) index arrays, and offers features to directly process the mapped array without converting it to pandas; for example, to compute various statistics by column, such as standard deviation.

Consider the following example:

>>> import numpy as np
>>> import pandas as pd
>>> from numba import njit
>>> import vectorbt as vbt

>>> a = np.array([10., 11., 12., 13., 14., 15., 16., 17., 18.])
>>> col_arr = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])
>>> idx_arr = np.array([0, 1, 2, 0, 1, 2, 0, 1, 2])
>>> wrapper = vbt.ArrayWrapper(index=['x', 'y', 'z'],
...     columns=['a', 'b', 'c'], ndim=2, freq='1 day')
>>> ma = vbt.MappedArray(wrapper, a, col_arr, idx_arr=idx_arr)

Reducing

Using MappedArray, you can then reduce by column as follows:

>>> ma.mean()
a    11.0
b    14.0
c    17.0
dtype: float64
>>> ma.to_pd().mean()
a    11.0
b    14.0
c    17.0
dtype: float64
>>> @njit
... def pow_mean_reduce_nb(col, a, pow):
...     return np.mean(a ** pow)

>>> ma.reduce(pow_mean_reduce_nb, 2)
a    121.666667
b    196.666667
c    289.666667
dtype: float64

>>> @njit
... def min_max_reduce_nb(col, a):
...     return np.array([np.min(a), np.max(a)])

>>> ma.reduce(min_max_reduce_nb, returns_array=True, index=['min', 'max'])
        a     b     c
min  10.0  13.0  16.0
max  12.0  15.0  18.0

>>> @njit
... def idxmin_idxmax_reduce_nb(col, a):
...     return np.array([np.argmin(a), np.argmax(a)])

>>> ma.reduce(idxmin_idxmax_reduce_nb, returns_array=True,
...     returns_idx=True, index=['idxmin', 'idxmax'])
        a  b  c
idxmin  x  x  x
idxmax  z  z  z

Mapping

Use MappedArray.apply() to apply a function on each column/group:

>>> @njit
... def cumsum_apply_nb(idxs, col, a):
...     return np.cumsum(a)

>>> ma.apply(cumsum_apply_nb)
<vectorbt.records.mapped_array.MappedArray at 0x7ff061382198>

>>> ma.apply(cumsum_apply_nb).values
array([10., 21., 33., 13., 27., 42., 16., 33., 51.])

>>> group_by = np.array(['first', 'first', 'second'])
>>> ma.apply(cumsum_apply_nb, group_by=group_by, apply_per_group=True).values
array([10., 21., 33., 46., 60., 75., 16., 33., 51.])

Notice how cumsum resets at each column in the first example and at each group in the second example.

## Conversion

You can expand any `MappedArray` instance to pandas:

* Given `idx_arr` was provided:

```pycon
>>> ma.to_pd()
      a     b     c
x  10.0  13.0  16.0
y  11.0  14.0  17.0
z  12.0  15.0  18.0

Note

Will raise an error if there are multiple values pointing to the same position.

  • In case group_by was provided, index can be ignored, or there are position conflicts:
>>> ma.to_pd(group_by=np.array(['first', 'first', 'second']), ignore_index=True)
   first  second
0   10.0    16.0
1   11.0    17.0
2   12.0    18.0
3   13.0     NaN
4   14.0     NaN
5   15.0     NaN

Filtering

Use MappedArray.apply_mask() to filter elements per column/group:

>>> mask = [True, False, True, False, True, False, True, False, True]
>>> filtered_ma = ma.apply_mask(mask)
>>> filtered_ma.count()
a    2
b    1
c    2
dtype: int64

>>> filtered_ma.id_arr
array([0, 2, 4, 6, 8])

Plotting

You can build histograms and boxplots of MappedArray directly:

>>> ma.boxplot()

To use scatterplots or any other plots that require index, convert to pandas first:

>>> ma.to_pd().vbt.plot()

Grouping

One of the key features of MappedArray is that you can perform reducing operations on a group of columns as if they were a single column. Groups can be specified by group_by, which can be anything from positions or names of column levels, to a NumPy array with actual groups.

There are multiple ways of define grouping:

>>> group_by = np.array(['first', 'first', 'second'])
>>> grouped_wrapper = wrapper.replace(group_by=group_by)
>>> grouped_ma = vbt.MappedArray(grouped_wrapper, a, col_arr, idx_arr=idx_arr)

>>> grouped_ma.mean()
first     12.5
second    17.0
dtype: float64
>>> ma.regroup(group_by).mean()
first     12.5
second    17.0
dtype: float64
  • Pass group_by directly to the reducing method:
>>> ma.mean(group_by=group_by)
first     12.5
second    17.0
dtype: float64

By the same way you can disable or modify any existing grouping:

>>> grouped_ma.mean(group_by=False)
a    11.0
b    14.0
c    17.0
dtype: float64

Note

Grouping applies only to reducing operations, there is no change to the arrays.

Operators

MappedArray implements arithmetic, comparison and logical operators. You can perform basic operations (such as addition) on mapped arrays as if they were NumPy arrays.

>>> ma ** 2
<vectorbt.records.mapped_array.MappedArray at 0x7f97bfc49358>

>>> ma * np.array([1, 2, 3, 4, 5, 6])
<vectorbt.records.mapped_array.MappedArray at 0x7f97bfc65e80>

>>> ma + ma
<vectorbt.records.mapped_array.MappedArray at 0x7fd638004d30>

Note

You should ensure that your MappedArray operand is on the left if the other operand is an array.

If two MappedArray operands have different metadata, will copy metadata from the first one, but at least their id_arr and col_arr must match.

Indexing

Like any other class subclassing Wrapping, we can do pandas indexing on a MappedArray instance, which forwards indexing operation to each object with columns:

>>> ma['a'].values
array([10., 11., 12.])

>>> grouped_ma['first'].values
array([10., 11., 12., 13., 14., 15.])

Note

Changing index (time axis) is not supported. The object should be treated as a Series rather than a DataFrame; for example, use some_field.iloc[0] instead of some_field.iloc[:, 0].

Indexing behavior depends solely upon ArrayWrapper. For example, if group_select is enabled indexing will be performed on groups, otherwise on single columns.

Caching

MappedArray supports caching. If a method or a property requires heavy computation, it's wrapped with cached_method() and cached_property respectively. Caching can be disabled globally via caching in settings.

Note

Because of caching, class is meant to be immutable and all properties are read-only. To change any attribute, use the copy method and pass the attribute as keyword argument.

Saving and loading

Like any other class subclassing Pickleable, we can save a MappedArray instance to the disk with Pickleable.save() and load it with Pickleable.load().

Stats

Metric for mapped arrays are similar to that for GenericAccessor.

>>> ma.stats(column='a')
Start                      x
End                        z
Period       3 days 00:00:00
Count                      3
Mean                    11.0
Std                      1.0
Min                     10.0
Median                  11.0
Max                     12.0
Min Index                  x
Max Index                  z
Name: a, dtype: object

The main difference unfolds once the mapped array has a mapping: values are then considered as categorical and usual statistics are meaningless to compute. For this case, StatsBuilderMixin.stats() returns the value counts:

>>> mapping = {v: "test_" + str(v) for v in np.unique(ma.values)}
>>> ma.stats(column='a', settings=dict(mapping=mapping))
Start                                    x
End                                      z
Period                     3 days 00:00:00
Count                                    3
Value Counts: test_10.0                  1
Value Counts: test_11.0                  1
Value Counts: test_12.0                  1
Value Counts: test_13.0                  0
Value Counts: test_14.0                  0
Value Counts: test_15.0                  0
Value Counts: test_16.0                  0
Value Counts: test_17.0                  0
Value Counts: test_18.0                  0
Name: a, dtype: object

`MappedArray.stats` also supports (re-)grouping:

```pycon
>>> grouped_ma.stats(column='first')
Start                      x
End                        z
Period       3 days 00:00:00
Count                      6
Mean                    12.5
Std                 1.870829
Min                     10.0
Median                  12.5
Max                     15.0
Min Index                  x
Max Index                  z
Name: first, dtype: object

Plots

MappedArray class has a single subplot based on MappedArray.to_pd() and GenericAccessor.plot():

>>> ma.plots()


combine_mapped_with_other function

combine_mapped_with_other(
    other,
    np_func
)

Combine MappedArray with other compatible object.

If other object is also MappedArray, their id_arr and col_arr must match.


MappedArray class

MappedArray(
    wrapper,
    mapped_arr,
    col_arr,
    id_arr=None,
    idx_arr=None,
    mapping=None,
    col_mapper=None,
    **kwargs
)

Exposes methods for reducing, converting, and plotting arrays mapped by Records class.

Args

wrapper : ArrayWrapper

Array wrapper.

See ArrayWrapper.

mapped_arr : array_like
A one-dimensional array of mapped record values.
col_arr : array_like

A one-dimensional column array.

Must be of the same size as mapped_arr.

id_arr : array_like

A one-dimensional id array. Defaults to simple range.

Must be of the same size as mapped_arr.

idx_arr : array_like

A one-dimensional index array. Optional.

Must be of the same size as mapped_arr.

mapping : namedtuple, dict or callable
Mapping.
col_mapper : ColumnMapper

Column mapper if already known.

Note

It depends upon wrapper and col_arr, so make sure to invalidate col_mapper upon creating a MappedArray instance with a modified wrapper or `col_arr.

MappedArray.replace() does it automatically.

**kwargs

Custom keyword arguments passed to the config.

Useful if any subclass wants to extend the config.

Superclasses

Inherited members


apply method

MappedArray.apply(
    apply_func_nb,
    *args,
    group_by=None,
    apply_per_group=False,
    dtype=None,
    **kwargs
)

Apply function on mapped array per column/group. Returns mapped array.

Applies per group if apply_per_group is True.

See apply_on_mapped_nb().

**kwargs are passed to MappedArray.replace().


apply_mapping method

MappedArray.apply_mapping(
    mapping=None,
    **kwargs
)

Apply mapping on each element.


apply_mask method

MappedArray.apply_mask(
    mask,
    idx_arr=None,
    group_by=None,
    **kwargs
)

Return a new class instance, filtered by mask.

**kwargs are passed to MappedArray.replace().


bottom_n method

MappedArray.bottom_n(
    n,
    **kwargs
)

Filter bottom N elements from each column/group.


bottom_n_mask method

MappedArray.bottom_n_mask(
    n,
    **kwargs
)

Return mask of bottom N elements in each column/group.


boxplot method

MappedArray.boxplot(
    group_by=None,
    **kwargs
)

Plot box plot by column/group.


col_arr property

Column array.


col_mapper property

Column mapper.

See ColumnMapper.


count method

MappedArray.count(
    group_by=None,
    wrap_kwargs=None
)

Return number of values by column/group.


describe method

MappedArray.describe(
    percentiles=None,
    ddof=1,
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return statistics by column/group.


histplot method

MappedArray.histplot(
    group_by=None,
    **kwargs
)

Plot histogram by column/group.


id_arr property

Id array.


idx_arr property

Index array.


idxmax method

MappedArray.idxmax(
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return index of max by column/group.


idxmin method

MappedArray.idxmin(
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return index of min by column/group.


indexing_func method

MappedArray.indexing_func(
    pd_indexing_func,
    **kwargs
)

Perform indexing on MappedArray.


indexing_func_meta method

MappedArray.indexing_func_meta(
    pd_indexing_func,
    **kwargs
)

Perform indexing on MappedArray and return metadata.


is_expandable method

MappedArray.is_expandable(
    idx_arr=None,
    group_by=None
)

See is_mapped_expandable_nb().


is_sorted method

MappedArray.is_sorted(
    incl_id=False
)

Check whether mapped array is sorted.


map_to_mask method

MappedArray.map_to_mask(
    inout_map_func_nb,
    *args,
    group_by=None
)

Map mapped array to a mask.

See mapped_to_mask_nb().


mapped_arr property

Mapped array.


mapping property

Mapping.


max method

MappedArray.max(
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return max by column/group.


mean method

MappedArray.mean(
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return mean by column/group.


median method

MappedArray.median(
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return median by column/group.


metrics class variable

Metrics supported by MappedArray.

Config({
    "start": {
        "title": "Start",
        "calc_func": "<function MappedArray.<lambda> at 0x13619d760>",
        "agg_func": null,
        "tags": "wrapper"
    },
    "end": {
        "title": "End",
        "calc_func": "<function MappedArray.<lambda> at 0x13619d800>",
        "agg_func": null,
        "tags": "wrapper"
    },
    "period": {
        "title": "Period",
        "calc_func": "<function MappedArray.<lambda> at 0x13619d8a0>",
        "apply_to_timedelta": true,
        "agg_func": null,
        "tags": "wrapper"
    },
    "count": {
        "title": "Count",
        "calc_func": "count",
        "tags": "mapped_array"
    },
    "mean": {
        "title": "Mean",
        "calc_func": "mean",
        "inv_check_has_mapping": true,
        "tags": [
            "mapped_array",
            "describe"
        ]
    },
    "std": {
        "title": "Std",
        "calc_func": "std",
        "inv_check_has_mapping": true,
        "tags": [
            "mapped_array",
            "describe"
        ]
    },
    "min": {
        "title": "Min",
        "calc_func": "min",
        "inv_check_has_mapping": true,
        "tags": [
            "mapped_array",
            "describe"
        ]
    },
    "median": {
        "title": "Median",
        "calc_func": "median",
        "inv_check_has_mapping": true,
        "tags": [
            "mapped_array",
            "describe"
        ]
    },
    "max": {
        "title": "Max",
        "calc_func": "max",
        "inv_check_has_mapping": true,
        "tags": [
            "mapped_array",
            "describe"
        ]
    },
    "idx_min": {
        "title": "Min Index",
        "calc_func": "idxmin",
        "inv_check_has_mapping": true,
        "agg_func": null,
        "tags": [
            "mapped_array",
            "index"
        ]
    },
    "idx_max": {
        "title": "Max Index",
        "calc_func": "idxmax",
        "inv_check_has_mapping": true,
        "agg_func": null,
        "tags": [
            "mapped_array",
            "index"
        ]
    },
    "value_counts": {
        "title": "Value Counts",
        "calc_func": "<function MappedArray.<lambda> at 0x13619d940>",
        "resolve_value_counts": true,
        "check_has_mapping": true,
        "tags": [
            "mapped_array",
            "value_counts"
        ]
    }
})

Returns MappedArray._metrics, which gets (deep) copied upon creation of each instance. Thus, changing this config won't affect the class.

To change metrics, you can either change the config in-place, override this property, or overwrite the instance variable MappedArray._metrics.


min method

MappedArray.min(
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return min by column/group.


nth method

MappedArray.nth(
    n,
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return n-th element of each column/group.


nth_index method

MappedArray.nth_index(
    n,
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return index of n-th element of each column/group.


plots_defaults property

Defaults for PlotsBuilderMixin.plots().

Merges PlotsBuilderMixin.plots_defaults and mapped_array.plots from settings.


reduce method

MappedArray.reduce(
    reduce_func_nb,
    *args,
    idx_arr=None,
    returns_array=False,
    returns_idx=False,
    to_index=True,
    fill_value=nan,
    group_by=None,
    wrap_kwargs=None
)

Reduce mapped array by column/group.

If returns_array is False and returns_idx is False, see reduce_mapped_nb(). If returns_array is False and returns_idx is True, see reduce_mapped_to_idx_nb(). If returns_array is True and returns_idx is False, see reduce_mapped_to_array_nb(). If returns_array is True and returns_idx is True, see reduce_mapped_to_idx_array_nb().

If returns_idx is True, must pass idx_arr. Set to_index to False to return raw positions instead of labels. Use fill_value to set the default value. Set group_by to False to disable grouping.


replace method

MappedArray.replace(
    **kwargs
)

See Configured.replace().

Also, makes sure that MappedArray.col_mapper is not passed to the new instance.


sort method

MappedArray.sort(
    incl_id=False,
    idx_arr=None,
    group_by=None,
    **kwargs
)

Sort mapped array by column array (primary) and id array (secondary, optional).

**kwargs are passed to MappedArray.replace().


stats_defaults property

Defaults for StatsBuilderMixin.stats().

Merges StatsBuilderMixin.stats_defaults and mapped_array.stats from settings.


std method

MappedArray.std(
    ddof=1,
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return std by column/group.


subplots class variable

Subplots supported by MappedArray.

Config({
    "to_pd_plot": {
        "check_is_not_grouped": true,
        "plot_func": "to_pd.vbt.plot",
        "pass_trace_names": false,
        "tags": "mapped_array"
    }
})

Returns MappedArray._subplots, which gets (deep) copied upon creation of each instance. Thus, changing this config won't affect the class.

To change subplots, you can either change the config in-place, override this property, or overwrite the instance variable MappedArray._subplots.


sum method

MappedArray.sum(
    fill_value=0.0,
    group_by=None,
    wrap_kwargs=None,
    **kwargs
)

Return sum by column/group.


to_index method

MappedArray.to_index()

Convert to index.


to_pd method

MappedArray.to_pd(
    idx_arr=None,
    ignore_index=False,
    fill_value=nan,
    group_by=None,
    wrap_kwargs=None
)

Expand mapped array to a Series/DataFrame.

If ignore_index, will ignore the index and stack data points on top of each other in every column/group (see stack_expand_mapped_nb()). Otherwise, see expand_mapped_nb().

Note

Will raise an error if there are multiple values pointing to the same position. Set ignore_index to True in this case.

Warning

Mapped arrays represent information in the most memory-friendly format. Mapping back to pandas may occupy lots of memory if records are sparse.


top_n method

MappedArray.top_n(
    n,
    **kwargs
)

Filter top N elements from each column/group.


top_n_mask method

MappedArray.top_n_mask(
    n,
    **kwargs
)

Return mask of top N elements in each column/group.


value_counts method

MappedArray.value_counts(
    normalize=False,
    sort_uniques=True,
    sort=False,
    ascending=False,
    dropna=False,
    group_by=None,
    mapping=None,
    incl_all_keys=False,
    wrap_kwargs=None,
    **kwargs
)

See GenericAccessor.value_counts().

Note

Does not take into account missing values.


values property

Mapped array.


MetaMappedArray class

MetaMappedArray(
    *args,
    **kwargs
)

Meta class that exposes a read-only class property StatsBuilderMixin.metrics.

Superclasses

Inherited members