mapped_array module¶
Base class for working with mapped arrays.
This class takes the mapped array and the corresponding column and (optionally) index arrays, and offers features to directly process the mapped array without converting it to pandas; for example, to compute various statistics by column, such as standard deviation.
Consider the following example:
>>> import numpy as np
>>> import pandas as pd
>>> from numba import njit
>>> import vectorbt as vbt
>>> a = np.array([10., 11., 12., 13., 14., 15., 16., 17., 18.])
>>> col_arr = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])
>>> idx_arr = np.array([0, 1, 2, 0, 1, 2, 0, 1, 2])
>>> wrapper = vbt.ArrayWrapper(index=['x', 'y', 'z'],
... columns=['a', 'b', 'c'], ndim=2, freq='1 day')
>>> ma = vbt.MappedArray(wrapper, a, col_arr, idx_arr=idx_arr)
Reducing¶
Using MappedArray, you can then reduce by column as follows:
- Use already provided reducers such as MappedArray.mean():
>>> ma.mean()
a 11.0
b 14.0
c 17.0
dtype: float64
- Use MappedArray.to_pd() to map to pandas and then reduce manually (expensive):
>>> ma.to_pd().mean()
a 11.0
b 14.0
c 17.0
dtype: float64
- Use MappedArray.reduce() to reduce using a custom function:
>>> @njit
... def pow_mean_reduce_nb(col, a, pow):
... return np.mean(a ** pow)
>>> ma.reduce(pow_mean_reduce_nb, 2)
a 121.666667
b 196.666667
c 289.666667
dtype: float64
>>> @njit
... def min_max_reduce_nb(col, a):
... return np.array([np.min(a), np.max(a)])
>>> ma.reduce(min_max_reduce_nb, returns_array=True, index=['min', 'max'])
a b c
min 10.0 13.0 16.0
max 12.0 15.0 18.0
>>> @njit
... def idxmin_idxmax_reduce_nb(col, a):
... return np.array([np.argmin(a), np.argmax(a)])
>>> ma.reduce(idxmin_idxmax_reduce_nb, returns_array=True,
... returns_idx=True, index=['idxmin', 'idxmax'])
a b c
idxmin x x x
idxmax z z z
Mapping¶
Use MappedArray.apply() to apply a function on each column/group:
>>> @njit
... def cumsum_apply_nb(idxs, col, a):
... return np.cumsum(a)
>>> ma.apply(cumsum_apply_nb)
<vectorbt.records.mapped_array.MappedArray at 0x7ff061382198>
>>> ma.apply(cumsum_apply_nb).values
array([10., 21., 33., 13., 27., 42., 16., 33., 51.])
>>> group_by = np.array(['first', 'first', 'second'])
>>> ma.apply(cumsum_apply_nb, group_by=group_by, apply_per_group=True).values
array([10., 21., 33., 46., 60., 75., 16., 33., 51.])
Notice how cumsum resets at each column in the first example and at each group in the second example.
## Conversion
You can expand any `MappedArray` instance to pandas:
* Given `idx_arr` was provided:
```pycon
>>> ma.to_pd()
a b c
x 10.0 13.0 16.0
y 11.0 14.0 17.0
z 12.0 15.0 18.0
Note
Will raise an error if there are multiple values pointing to the same position.
- In case
group_by
was provided, index can be ignored, or there are position conflicts:
>>> ma.to_pd(group_by=np.array(['first', 'first', 'second']), ignore_index=True)
first second
0 10.0 16.0
1 11.0 17.0
2 12.0 18.0
3 13.0 NaN
4 14.0 NaN
5 15.0 NaN
Filtering¶
Use MappedArray.apply_mask() to filter elements per column/group:
>>> mask = [True, False, True, False, True, False, True, False, True]
>>> filtered_ma = ma.apply_mask(mask)
>>> filtered_ma.count()
a 2
b 1
c 2
dtype: int64
>>> filtered_ma.id_arr
array([0, 2, 4, 6, 8])
Plotting¶
You can build histograms and boxplots of MappedArray directly:
>>> ma.boxplot()
To use scatterplots or any other plots that require index, convert to pandas first:
>>> ma.to_pd().vbt.plot()
Grouping¶
One of the key features of MappedArray is that you can perform reducing operations on a group of columns as if they were a single column. Groups can be specified by group_by
, which can be anything from positions or names of column levels, to a NumPy array with actual groups.
There are multiple ways of define grouping:
- When creating MappedArray, pass
group_by
to ArrayWrapper:
>>> group_by = np.array(['first', 'first', 'second'])
>>> grouped_wrapper = wrapper.replace(group_by=group_by)
>>> grouped_ma = vbt.MappedArray(grouped_wrapper, a, col_arr, idx_arr=idx_arr)
>>> grouped_ma.mean()
first 12.5
second 17.0
dtype: float64
- Regroup an existing MappedArray:
>>> ma.regroup(group_by).mean()
first 12.5
second 17.0
dtype: float64
- Pass
group_by
directly to the reducing method:
>>> ma.mean(group_by=group_by)
first 12.5
second 17.0
dtype: float64
By the same way you can disable or modify any existing grouping:
>>> grouped_ma.mean(group_by=False)
a 11.0
b 14.0
c 17.0
dtype: float64
Note
Grouping applies only to reducing operations, there is no change to the arrays.
Operators¶
MappedArray implements arithmetic, comparison and logical operators. You can perform basic operations (such as addition) on mapped arrays as if they were NumPy arrays.
>>> ma ** 2
<vectorbt.records.mapped_array.MappedArray at 0x7f97bfc49358>
>>> ma * np.array([1, 2, 3, 4, 5, 6])
<vectorbt.records.mapped_array.MappedArray at 0x7f97bfc65e80>
>>> ma + ma
<vectorbt.records.mapped_array.MappedArray at 0x7fd638004d30>
Note
You should ensure that your MappedArray operand is on the left if the other operand is an array.
If two MappedArray operands have different metadata, will copy metadata from the first one, but at least their id_arr
and col_arr
must match.
Indexing¶
Like any other class subclassing Wrapping, we can do pandas indexing on a MappedArray instance, which forwards indexing operation to each object with columns:
>>> ma['a'].values
array([10., 11., 12.])
>>> grouped_ma['first'].values
array([10., 11., 12., 13., 14., 15.])
Note
Changing index (time axis) is not supported. The object should be treated as a Series rather than a DataFrame; for example, use some_field.iloc[0]
instead of some_field.iloc[:, 0]
.
Indexing behavior depends solely upon ArrayWrapper. For example, if group_select
is enabled indexing will be performed on groups, otherwise on single columns.
Caching¶
MappedArray supports caching. If a method or a property requires heavy computation, it's wrapped with cached_method() and cached_property respectively. Caching can be disabled globally via caching
in settings.
Note
Because of caching, class is meant to be immutable and all properties are read-only. To change any attribute, use the copy
method and pass the attribute as keyword argument.
Saving and loading¶
Like any other class subclassing Pickleable, we can save a MappedArray instance to the disk with Pickleable.save() and load it with Pickleable.load().
Stats¶
Hint
Metric for mapped arrays are similar to that for GenericAccessor.
>>> ma.stats(column='a')
Start x
End z
Period 3 days 00:00:00
Count 3
Mean 11.0
Std 1.0
Min 10.0
Median 11.0
Max 12.0
Min Index x
Max Index z
Name: a, dtype: object
The main difference unfolds once the mapped array has a mapping: values are then considered as categorical and usual statistics are meaningless to compute. For this case, StatsBuilderMixin.stats() returns the value counts:
>>> mapping = {v: "test_" + str(v) for v in np.unique(ma.values)}
>>> ma.stats(column='a', settings=dict(mapping=mapping))
Start x
End z
Period 3 days 00:00:00
Count 3
Value Counts: test_10.0 1
Value Counts: test_11.0 1
Value Counts: test_12.0 1
Value Counts: test_13.0 0
Value Counts: test_14.0 0
Value Counts: test_15.0 0
Value Counts: test_16.0 0
Value Counts: test_17.0 0
Value Counts: test_18.0 0
Name: a, dtype: object
`MappedArray.stats` also supports (re-)grouping:
```pycon
>>> grouped_ma.stats(column='first')
Start x
End z
Period 3 days 00:00:00
Count 6
Mean 12.5
Std 1.870829
Min 10.0
Median 12.5
Max 15.0
Min Index x
Max Index z
Name: first, dtype: object
Plots¶
Hint
MappedArray class has a single subplot based on MappedArray.to_pd() and GenericAccessor.plot():
>>> ma.plots()
combine_mapped_with_other function¶
combine_mapped_with_other(
other,
np_func
)
Combine MappedArray with other compatible object.
If other object is also MappedArray, their id_arr
and col_arr
must match.
MappedArray class¶
MappedArray(
wrapper,
mapped_arr,
col_arr,
id_arr=None,
idx_arr=None,
mapping=None,
col_mapper=None,
**kwargs
)
Exposes methods for reducing, converting, and plotting arrays mapped by Records class.
Args
wrapper
:ArrayWrapper
-
Array wrapper.
See ArrayWrapper.
mapped_arr
:array_like
- A one-dimensional array of mapped record values.
col_arr
:array_like
-
A one-dimensional column array.
Must be of the same size as
mapped_arr
. id_arr
:array_like
-
A one-dimensional id array. Defaults to simple range.
Must be of the same size as
mapped_arr
. idx_arr
:array_like
-
A one-dimensional index array. Optional.
Must be of the same size as
mapped_arr
. mapping
:namedtuple
,dict
orcallable
- Mapping.
col_mapper
:ColumnMapper
-
Column mapper if already known.
Note
It depends upon
wrapper
andcol_arr
, so make sure to invalidatecol_mapper
upon creating a MappedArray instance with a modifiedwrapper
or `col_arr.MappedArray.replace() does it automatically.
**kwargs
-
Custom keyword arguments passed to the config.
Useful if any subclass wants to extend the config.
Superclasses
- AttrResolver
- Configured
- Documented
- IndexingBase
- PandasIndexer
- Pickleable
- PlotsBuilderMixin
- StatsBuilderMixin
- Wrapping
Inherited members
- AttrResolver.deep_getattr()
- AttrResolver.post_resolve_attr()
- AttrResolver.pre_resolve_attr()
- AttrResolver.resolve_attr()
- Configured.copy()
- Configured.dumps()
- Configured.loads()
- Configured.to_doc()
- Configured.update_config()
- PandasIndexer.xs()
- Pickleable.load()
- Pickleable.save()
- PlotsBuilderMixin.build_subplots_doc()
- PlotsBuilderMixin.override_subplots_doc()
- PlotsBuilderMixin.plots()
- StatsBuilderMixin.build_metrics_doc()
- StatsBuilderMixin.override_metrics_doc()
- StatsBuilderMixin.stats()
- Wrapping.config
- Wrapping.iloc
- Wrapping.indexing_kwargs
- Wrapping.loc
- Wrapping.regroup()
- Wrapping.resolve_self()
- Wrapping.select_one()
- Wrapping.select_one_from_obj()
- Wrapping.self_aliases
- Wrapping.wrapper
- Wrapping.writeable_attrs
apply method¶
MappedArray.apply(
apply_func_nb,
*args,
group_by=None,
apply_per_group=False,
dtype=None,
**kwargs
)
Apply function on mapped array per column/group. Returns mapped array.
Applies per group if apply_per_group
is True.
See apply_on_mapped_nb().
**kwargs
are passed to MappedArray.replace().
apply_mapping method¶
MappedArray.apply_mapping(
mapping=None,
**kwargs
)
Apply mapping on each element.
apply_mask method¶
MappedArray.apply_mask(
mask,
idx_arr=None,
group_by=None,
**kwargs
)
Return a new class instance, filtered by mask.
**kwargs
are passed to MappedArray.replace().
bottom_n method¶
MappedArray.bottom_n(
n,
**kwargs
)
Filter bottom N elements from each column/group.
bottom_n_mask method¶
MappedArray.bottom_n_mask(
n,
**kwargs
)
Return mask of bottom N elements in each column/group.
boxplot method¶
MappedArray.boxplot(
group_by=None,
**kwargs
)
Plot box plot by column/group.
col_arr property¶
Column array.
col_mapper property¶
Column mapper.
See ColumnMapper.
count method¶
MappedArray.count(
group_by=None,
wrap_kwargs=None
)
Return number of values by column/group.
describe method¶
MappedArray.describe(
percentiles=None,
ddof=1,
group_by=None,
wrap_kwargs=None,
**kwargs
)
Return statistics by column/group.
histplot method¶
MappedArray.histplot(
group_by=None,
**kwargs
)
Plot histogram by column/group.
id_arr property¶
Id array.
idx_arr property¶
Index array.
idxmax method¶
MappedArray.idxmax(
group_by=None,
wrap_kwargs=None,
**kwargs
)
Return index of max by column/group.
idxmin method¶
MappedArray.idxmin(
group_by=None,
wrap_kwargs=None,
**kwargs
)
Return index of min by column/group.
indexing_func method¶
MappedArray.indexing_func(
pd_indexing_func,
**kwargs
)
Perform indexing on MappedArray.
indexing_func_meta method¶
MappedArray.indexing_func_meta(
pd_indexing_func,
**kwargs
)
Perform indexing on MappedArray and return metadata.
is_expandable method¶
MappedArray.is_expandable(
idx_arr=None,
group_by=None
)
See is_mapped_expandable_nb().
is_sorted method¶
MappedArray.is_sorted(
incl_id=False
)
Check whether mapped array is sorted.
map_to_mask method¶
MappedArray.map_to_mask(
inout_map_func_nb,
*args,
group_by=None
)
Map mapped array to a mask.
See mapped_to_mask_nb().
mapped_arr property¶
Mapped array.
mapping property¶
Mapping.
max method¶
MappedArray.max(
group_by=None,
wrap_kwargs=None,
**kwargs
)
Return max by column/group.
mean method¶
MappedArray.mean(
group_by=None,
wrap_kwargs=None,
**kwargs
)
Return mean by column/group.
median method¶
MappedArray.median(
group_by=None,
wrap_kwargs=None,
**kwargs
)
Return median by column/group.
metrics class variable¶
Metrics supported by MappedArray.
Config({
"start": {
"title": "Start",
"calc_func": "<function MappedArray.<lambda> at 0x13619d760>",
"agg_func": null,
"tags": "wrapper"
},
"end": {
"title": "End",
"calc_func": "<function MappedArray.<lambda> at 0x13619d800>",
"agg_func": null,
"tags": "wrapper"
},
"period": {
"title": "Period",
"calc_func": "<function MappedArray.<lambda> at 0x13619d8a0>",
"apply_to_timedelta": true,
"agg_func": null,
"tags": "wrapper"
},
"count": {
"title": "Count",
"calc_func": "count",
"tags": "mapped_array"
},
"mean": {
"title": "Mean",
"calc_func": "mean",
"inv_check_has_mapping": true,
"tags": [
"mapped_array",
"describe"
]
},
"std": {
"title": "Std",
"calc_func": "std",
"inv_check_has_mapping": true,
"tags": [
"mapped_array",
"describe"
]
},
"min": {
"title": "Min",
"calc_func": "min",
"inv_check_has_mapping": true,
"tags": [
"mapped_array",
"describe"
]
},
"median": {
"title": "Median",
"calc_func": "median",
"inv_check_has_mapping": true,
"tags": [
"mapped_array",
"describe"
]
},
"max": {
"title": "Max",
"calc_func": "max",
"inv_check_has_mapping": true,
"tags": [
"mapped_array",
"describe"
]
},
"idx_min": {
"title": "Min Index",
"calc_func": "idxmin",
"inv_check_has_mapping": true,
"agg_func": null,
"tags": [
"mapped_array",
"index"
]
},
"idx_max": {
"title": "Max Index",
"calc_func": "idxmax",
"inv_check_has_mapping": true,
"agg_func": null,
"tags": [
"mapped_array",
"index"
]
},
"value_counts": {
"title": "Value Counts",
"calc_func": "<function MappedArray.<lambda> at 0x13619d940>",
"resolve_value_counts": true,
"check_has_mapping": true,
"tags": [
"mapped_array",
"value_counts"
]
}
})
Returns MappedArray._metrics
, which gets (deep) copied upon creation of each instance. Thus, changing this config won't affect the class.
To change metrics, you can either change the config in-place, override this property, or overwrite the instance variable MappedArray._metrics
.
min method¶
MappedArray.min(
group_by=None,
wrap_kwargs=None,
**kwargs
)
Return min by column/group.
nth method¶
MappedArray.nth(
n,
group_by=None,
wrap_kwargs=None,
**kwargs
)
Return n-th element of each column/group.
nth_index method¶
MappedArray.nth_index(
n,
group_by=None,
wrap_kwargs=None,
**kwargs
)
Return index of n-th element of each column/group.
plots_defaults property¶
Defaults for PlotsBuilderMixin.plots().
Merges PlotsBuilderMixin.plots_defaults and mapped_array.plots
from settings.
reduce method¶
MappedArray.reduce(
reduce_func_nb,
*args,
idx_arr=None,
returns_array=False,
returns_idx=False,
to_index=True,
fill_value=nan,
group_by=None,
wrap_kwargs=None
)
Reduce mapped array by column/group.
If returns_array
is False and returns_idx
is False, see reduce_mapped_nb(). If returns_array
is False and returns_idx
is True, see reduce_mapped_to_idx_nb(). If returns_array
is True and returns_idx
is False, see reduce_mapped_to_array_nb(). If returns_array
is True and returns_idx
is True, see reduce_mapped_to_idx_array_nb().
If returns_idx
is True, must pass idx_arr
. Set to_index
to False to return raw positions instead of labels. Use fill_value
to set the default value. Set group_by
to False to disable grouping.
replace method¶
MappedArray.replace(
**kwargs
)
See Configured.replace().
Also, makes sure that MappedArray.col_mapper is not passed to the new instance.
sort method¶
MappedArray.sort(
incl_id=False,
idx_arr=None,
group_by=None,
**kwargs
)
Sort mapped array by column array (primary) and id array (secondary, optional).
**kwargs
are passed to MappedArray.replace().
stats_defaults property¶
Defaults for StatsBuilderMixin.stats().
Merges StatsBuilderMixin.stats_defaults and mapped_array.stats
from settings.
std method¶
MappedArray.std(
ddof=1,
group_by=None,
wrap_kwargs=None,
**kwargs
)
Return std by column/group.
subplots class variable¶
Subplots supported by MappedArray.
Config({
"to_pd_plot": {
"check_is_not_grouped": true,
"plot_func": "to_pd.vbt.plot",
"pass_trace_names": false,
"tags": "mapped_array"
}
})
Returns MappedArray._subplots
, which gets (deep) copied upon creation of each instance. Thus, changing this config won't affect the class.
To change subplots, you can either change the config in-place, override this property, or overwrite the instance variable MappedArray._subplots
.
sum method¶
MappedArray.sum(
fill_value=0.0,
group_by=None,
wrap_kwargs=None,
**kwargs
)
Return sum by column/group.
to_index method¶
MappedArray.to_index()
Convert to index.
to_pd method¶
MappedArray.to_pd(
idx_arr=None,
ignore_index=False,
fill_value=nan,
group_by=None,
wrap_kwargs=None
)
Expand mapped array to a Series/DataFrame.
If ignore_index
, will ignore the index and stack data points on top of each other in every column/group (see stack_expand_mapped_nb()). Otherwise, see expand_mapped_nb().
Note
Will raise an error if there are multiple values pointing to the same position. Set ignore_index
to True in this case.
Warning
Mapped arrays represent information in the most memory-friendly format. Mapping back to pandas may occupy lots of memory if records are sparse.
top_n method¶
MappedArray.top_n(
n,
**kwargs
)
Filter top N elements from each column/group.
top_n_mask method¶
MappedArray.top_n_mask(
n,
**kwargs
)
Return mask of top N elements in each column/group.
value_counts method¶
MappedArray.value_counts(
normalize=False,
sort_uniques=True,
sort=False,
ascending=False,
dropna=False,
group_by=None,
mapping=None,
incl_all_keys=False,
wrap_kwargs=None,
**kwargs
)
See GenericAccessor.value_counts().
Note
Does not take into account missing values.
values property¶
Mapped array.
MetaMappedArray class¶
MetaMappedArray(
*args,
**kwargs
)
Meta class that exposes a read-only class property StatsBuilderMixin.metrics
.
Superclasses
- MetaPlotsBuilderMixin
- MetaStatsBuilderMixin
builtins.type
Inherited members