base module¶
Base class for working with records.
vectorbt works with two different representations of data: matrices and records.
A matrix, in this context, is just an array of one-dimensional arrays, each corresponding to a separate feature. The matrix itself holds only one kind of information (one attribute). For example, one can create a matrix for entry signals, with columns being different strategy configurations. But what if the matrix is huge and sparse? What if there is more information we would like to represent by each element? Creating multiple matrices would be a waste of memory.
Records make possible representing complex, sparse information in a dense format. They are just an array of one-dimensional arrays of fixed schema. You can imagine records being a DataFrame, where each row represents a record and each column represents a specific attribute.
a b
0 1.0 5.0
attr1 = 1 2.0 NaN
2 NaN 7.0
3 4.0 8.0
a b
0 9.0 13.0
attr2 = 1 10.0 NaN
2 NaN 15.0
3 12.0 16.0
|
v
id col idx attr1 attr2
0 0 0 0 1 9
1 1 0 1 2 10
2 2 0 3 4 12
3 3 1 0 5 13
4 4 1 1 7 15
5 5 1 3 8 16
Another advantage of records is that they are not constrained by size. Multiple records can map to a single element in a matrix. For example, one can define multiple orders at the same time step, which is impossible to represent in a matrix form without using complex data types.
Consider the following example:
>>> import numpy as np
>>> import pandas as pd
>>> from numba import njit
>>> from collections import namedtuple
>>> import vectorbt as vbt
>>> example_dt = np.dtype([
... ('id', np.int_),
... ('col', np.int_),
... ('idx', np.int_),
... ('some_field', np.float_)
... ])
>>> records_arr = np.array([
... (0, 0, 0, 10.),
... (1, 0, 1, 11.),
... (2, 0, 2, 12.),
... (3, 1, 0, 13.),
... (4, 1, 1, 14.),
... (5, 1, 2, 15.),
... (6, 2, 0, 16.),
... (7, 2, 1, 17.),
... (8, 2, 2, 18.)
... ], dtype=example_dt)
>>> wrapper = vbt.ArrayWrapper(index=['x', 'y', 'z'],
... columns=['a', 'b', 'c'], ndim=2, freq='1 day')
>>> records = vbt.Records(wrapper, records_arr)
Printing¶
There are two ways to print records:
- Raw dataframe that preserves field names and data types:
>>> records.records
id col idx some_field
0 0 0 0 10.0
1 1 0 1 11.0
2 2 0 2 12.0
3 3 1 0 13.0
4 4 1 1 14.0
5 5 1 2 15.0
6 6 2 0 16.0
7 7 2 1 17.0
8 8 2 2 18.0
- Readable dataframe that takes into consideration Records.field_config:
>>> records.records_readable
Id Column Timestamp some_field
0 0 a x 10.0
1 1 a y 11.0
2 2 a z 12.0
3 3 b x 13.0
4 4 b y 14.0
5 5 b z 15.0
6 6 c x 16.0
7 7 c y 17.0
8 8 c z 18.0
Mapping¶
Records are just structured arrays with a bunch of methods and properties for processing them. Their main feature is to map the records array and to reduce it by column (similar to the MapReduce paradigm). The main advantage is that it all happens without conversion to the matrix form and wasting memory resources.
Records can be mapped to MappedArray in several ways:
- Use Records.map_field() to map a record field:
>>> records.map_field('some_field')
<vectorbt.records.mapped_array.MappedArray at 0x7ff49bd31a58>
>>> records.map_field('some_field').values
array([10., 11., 12., 13., 14., 15., 16., 17., 18.])
- Use Records.map() to map records using a custom function.
>>> @njit
... def power_map_nb(record, pow):
... return record.some_field ** pow
>>> records.map(power_map_nb, 2)
<vectorbt.records.mapped_array.MappedArray at 0x7ff49c990cf8>
>>> records.map(power_map_nb, 2).values
array([100., 121., 144., 169., 196., 225., 256., 289., 324.])
- Use Records.map_array() to convert an array to MappedArray.
>>> records.map_array(records_arr['some_field'] ** 2)
<vectorbt.records.mapped_array.MappedArray object at 0x7fe9bccf2978>
>>> records.map_array(records_arr['some_field'] ** 2).values
array([100., 121., 144., 169., 196., 225., 256., 289., 324.])
- Use Records.apply() to apply a function on each column/group:
>>> @njit
... def cumsum_apply_nb(records):
... return np.cumsum(records.some_field)
>>> records.apply(cumsum_apply_nb)
<vectorbt.records.mapped_array.MappedArray at 0x7ff49c990cf8>
>>> records.apply(cumsum_apply_nb).values
array([10., 21., 33., 13., 27., 42., 16., 33., 51.])
>>> group_by = np.array(['first', 'first', 'second'])
>>> records.apply(cumsum_apply_nb, group_by=group_by, apply_per_group=True).values
array([10., 21., 33., 46., 60., 75., 16., 33., 51.])
Notice how cumsum resets at each column in the first example and at each group in the second example.
Filtering¶
Use Records.apply_mask() to filter elements per column/group:
>>> mask = [True, False, True, False, True, False, True, False, True]
>>> filtered_records = records.apply_mask(mask)
>>> filtered_records.count()
a 2
b 1
c 2
dtype: int64
>>> filtered_records.values['id']
array([0, 2, 4, 6, 8])
Grouping¶
One of the key features of Records is that you can perform reducing operations on a group of columns as if they were a single column. Groups can be specified by group_by
, which can be anything from positions or names of column levels, to a NumPy array with actual groups.
There are multiple ways of define grouping:
- When creating Records, pass
group_by
to ArrayWrapper:
>>> group_by = np.array(['first', 'first', 'second'])
>>> grouped_wrapper = wrapper.replace(group_by=group_by)
>>> grouped_records = vbt.Records(grouped_wrapper, records_arr)
>>> grouped_records.map_field('some_field').mean()
first 12.5
second 17.0
dtype: float64
- Regroup an existing Records:
>>> records.regroup(group_by).map_field('some_field').mean()
first 12.5
second 17.0
dtype: float64
- Pass
group_by
directly to the mapping method:
>>> records.map_field('some_field', group_by=group_by).mean()
first 12.5
second 17.0
dtype: float64
- Pass
group_by
directly to the reducing method:
>>> records.map_field('some_field').mean(group_by=group_by)
a 11.0
b 14.0
c 17.0
dtype: float64
Note
Grouping applies only to reducing operations, there is no change to the arrays.
Indexing¶
Like any other class subclassing Wrapping, we can do pandas indexing on a Records instance, which forwards indexing operation to each object with columns:
>>> records['a'].records
id col idx some_field
0 0 0 0 10.0
1 1 0 1 11.0
2 2 0 2 12.0
>>> grouped_records['first'].records
id col idx some_field
0 0 0 0 10.0
1 1 0 1 11.0
2 2 0 2 12.0
3 3 1 0 13.0
4 4 1 1 14.0
5 5 1 2 15.0
Note
Changing index (time axis) is not supported. The object should be treated as a Series rather than a DataFrame; for example, use some_field.iloc[0]
instead of some_field.iloc[:, 0]
.
Indexing behavior depends solely upon ArrayWrapper. For example, if group_select
is enabled indexing will be performed on groups, otherwise on single columns.
Caching¶
Records supports caching. If a method or a property requires heavy computation, it's wrapped with cached_method() and cached_property respectively. Caching can be disabled globally via caching
in settings.
Note
Because of caching, class is meant to be immutable and all properties are read-only. To change any attribute, use the copy
method and pass the attribute as keyword argument.
Saving and loading¶
Like any other class subclassing Pickleable, we can save a Records instance to the disk with Pickleable.save() and load it with Pickleable.load().
Stats¶
Hint
See StatsBuilderMixin.stats() and Records.metrics.
>>> records.stats(column='a')
Start x
End z
Period 3 days 00:00:00
Total Records 3
Name: a, dtype: object
StatsBuilderMixin.stats() also supports (re-)grouping:
>>> grouped_records.stats(column='first')
Start x
End z
Period 3 days 00:00:00
Total Records 6
Name: first, dtype: object
Plots¶
Hint
This class is too generic to have any subplots, but feel free to add custom subplots to your subclass.
Extending¶
Records class can be extended by subclassing.
In case some of our fields have the same meaning but different naming (such as the base field idx
) or other properties, we can override field_config
using override_field_config(). It will look for configs of all base classes and merge our config on top of them. This preserves any base class property that is not explicitly listed in our config.
>>> from vectorbt.records.decorators import override_field_config
>>> my_dt = np.dtype([
... ('my_id', np.int_),
... ('my_col', np.int_),
... ('my_idx', np.int_)
... ])
>>> my_fields_config = dict(
... dtype=my_dt,
... settings=dict(
... id=dict(name='my_id'),
... col=dict(name='my_col'),
... idx=dict(name='my_idx')
... )
... )
>>> @override_field_config(my_fields_config)
... class MyRecords(vbt.Records):
... pass
>>> records_arr = np.array([
... (0, 0, 0),
... (1, 0, 1),
... (2, 1, 0),
... (3, 1, 1)
... ], dtype=my_dt)
>>> wrapper = vbt.ArrayWrapper(index=['x', 'y'],
... columns=['a', 'b'], ndim=2, freq='1 day')
>>> my_records = MyRecords(wrapper, records_arr)
>>> my_records.id_arr
array([0, 1, 2, 3])
>>> my_records.col_arr
array([0, 0, 1, 1])
>>> my_records.idx_arr
array([0, 1, 0, 1])
Alternatively, we can override the _field_config
class attribute.
>>> @override_field_config
... class MyRecords(vbt.Records):
... _field_config = dict(
... dtype=my_dt,
... settings=dict(
... id=dict(name='my_id'),
... idx=dict(name='my_idx'),
... col=dict(name='my_col')
... )
... )
Note
Don't forget to decorate the class with @override_field_config
to inherit configs from base classes.
You can stop inheritance by not decorating or passing merge_configs=False
to the decorator.
MetaFields class¶
MetaFields(
*args,
**kwargs
)
Meta class that exposes a read-only class property MetaFields.field_config.
Superclasses
builtins.type
Subclasses
field_config property¶
Field config.
MetaRecords class¶
MetaRecords(
*args,
**kwargs
)
Meta class that exposes a read-only class property StatsBuilderMixin.metrics
.
Superclasses
- MetaFields
- MetaPlotsBuilderMixin
- MetaStatsBuilderMixin
builtins.type
Inherited members
Records class¶
Records(
wrapper,
records_arr,
col_mapper=None,
**kwargs
)
Wraps the actual records array (such as trades) and exposes methods for mapping it to some array of values (such as PnL of each trade).
Args
wrapper
:ArrayWrapper
-
Array wrapper.
See ArrayWrapper.
records_arr
:array_like
-
A structured NumPy array of records.
Must have the fields
id
(record index) andcol
(column index). col_mapper
:ColumnMapper
-
Column mapper if already known.
Note
It depends on
records_arr
, so make sure to invalidatecol_mapper
upon creating a Records instance with a modifiedrecords_arr
.Records.replace() does it automatically.
**kwargs
-
Custom keyword arguments passed to the config.
Useful if any subclass wants to extend the config.
Superclasses
- AttrResolver
- Configured
- Documented
- IndexingBase
- PandasIndexer
- Pickleable
- PlotsBuilderMixin
- RecordsWithFields
- StatsBuilderMixin
- Wrapping
Inherited members
- AttrResolver.deep_getattr()
- AttrResolver.post_resolve_attr()
- AttrResolver.pre_resolve_attr()
- AttrResolver.resolve_attr()
- Configured.copy()
- Configured.dumps()
- Configured.loads()
- Configured.to_doc()
- Configured.update_config()
- PandasIndexer.xs()
- Pickleable.load()
- Pickleable.save()
- PlotsBuilderMixin.build_subplots_doc()
- PlotsBuilderMixin.override_subplots_doc()
- PlotsBuilderMixin.plots()
- StatsBuilderMixin.build_metrics_doc()
- StatsBuilderMixin.override_metrics_doc()
- StatsBuilderMixin.stats()
- Wrapping.config
- Wrapping.iloc
- Wrapping.indexing_kwargs
- Wrapping.loc
- Wrapping.regroup()
- Wrapping.resolve_self()
- Wrapping.select_one()
- Wrapping.select_one_from_obj()
- Wrapping.self_aliases
- Wrapping.wrapper
- Wrapping.writeable_attrs
Subclasses
apply method¶
Records.apply(
apply_func_nb,
*args,
group_by=None,
apply_per_group=False,
dtype=None,
**kwargs
)
Apply function on records per column/group. Returns mapped array.
Applies per group if apply_per_group
is True.
**kwargs
are passed to Records.map_array().
apply_mask method¶
Records.apply_mask(
mask,
group_by=None,
**kwargs
)
Return a new class instance, filtered by mask.
build_field_config_doc class method¶
Records.build_field_config_doc(
source_cls=None
)
Build field config documentation.
col_arr property¶
Get column array.
col_mapper property¶
Column mapper.
See ColumnMapper.
count method¶
Records.count(
group_by=None,
wrap_kwargs=None
)
Return count by column.
field_config class variable¶
Field config of Records.
Config({
"dtype": null,
"settings": {
"id": {
"name": "id",
"title": "Id"
},
"col": {
"name": "col",
"title": "Column",
"mapping": "columns"
},
"idx": {
"name": "idx",
"title": "Timestamp",
"mapping": "index"
}
}
})
get_apply_mapping_arr method¶
Records.get_apply_mapping_arr(
field,
**kwargs
)
Resolve the mapped array on the field, with mapping applied. Uses Records.field_config.
get_by_col_idxs method¶
Records.get_by_col_idxs(
col_idxs
)
Get records corresponding to column indices.
Returns new records array.
get_field_arr method¶
Records.get_field_arr(
field
)
Resolve the array of the field. Uses Records.field_config.
get_field_mapping method¶
Records.get_field_mapping(
field
)
Resolve the mapping of the field. Uses Records.field_config.
get_field_name method¶
Records.get_field_name(
field
)
Resolve the name of the field. Uses Records.field_config..
get_field_setting method¶
Records.get_field_setting(
field,
setting,
default=None
)
Resolve any setting of the field. Uses Records.field_config.
get_field_title method¶
Records.get_field_title(
field
)
Resolve the title of the field. Uses Records.field_config.
get_map_field method¶
Records.get_map_field(
field,
**kwargs
)
Resolve the mapped array of the field. Uses Records.field_config.
get_map_field_to_index method¶
Records.get_map_field_to_index(
field,
**kwargs
)
Resolve the mapped array on the field, with index applied. Uses Records.field_config.
id_arr property¶
Get id array.
idx_arr property¶
Get index array.
indexing_func method¶
Records.indexing_func(
pd_indexing_func,
**kwargs
)
Perform indexing on Records.
indexing_func_meta method¶
Records.indexing_func_meta(
pd_indexing_func,
**kwargs
)
Perform indexing on Records and return metadata.
is_sorted method¶
Records.is_sorted(
incl_id=False
)
Check whether records are sorted.
map method¶
Records.map(
map_func_nb,
*args,
dtype=None,
**kwargs
)
Map each record to a scalar value. Returns mapped array.
See map_records_nb().
**kwargs
are passed to Records.map_array().
map_array method¶
Records.map_array(
a,
idx_arr=None,
mapping=None,
group_by=None,
**kwargs
)
Convert array to mapped array.
The length of the array should match that of the records.
map_field method¶
Records.map_field(
field,
**kwargs
)
Convert field to mapped array.
**kwargs
are passed to Records.map_array().
metrics class variable¶
Metrics supported by Records.
Config({
"start": {
"title": "Start",
"calc_func": "<function Records.<lambda> at 0x13619fc40>",
"agg_func": null,
"tags": "wrapper"
},
"end": {
"title": "End",
"calc_func": "<function Records.<lambda> at 0x13619fce0>",
"agg_func": null,
"tags": "wrapper"
},
"period": {
"title": "Period",
"calc_func": "<function Records.<lambda> at 0x13619fd80>",
"apply_to_timedelta": true,
"agg_func": null,
"tags": "wrapper"
},
"count": {
"title": "Count",
"calc_func": "count",
"tags": "records"
}
})
Returns Records._metrics
, which gets (deep) copied upon creation of each instance. Thus, changing this config won't affect the class.
To change metrics, you can either change the config in-place, override this property, or overwrite the instance variable Records._metrics
.
override_field_config_doc class method¶
Records.override_field_config_doc(
__pdoc__,
source_cls=None
)
Call this method on each subclass that overrides field_config
.
plots_defaults property¶
Defaults for PlotsBuilderMixin.plots().
Merges PlotsBuilderMixin.plots_defaults and records.plots
from settings.
recarray property¶
records property¶
Records.
records_arr property¶
Records array.
records_readable property¶
Records in readable format.
replace method¶
Records.replace(
**kwargs
)
See Configured.replace().
Also, makes sure that Records.col_mapper is not passed to the new instance.
sort method¶
Records.sort(
incl_id=False,
group_by=None,
**kwargs
)
Sort records by columns (primary) and ids (secondary, optional).
Note
Sorting is expensive. A better approach is to append records already in the correct order.
stats_defaults property¶
Defaults for StatsBuilderMixin.stats().
Merges StatsBuilderMixin.stats_defaults and records.stats
from settings.
subplots class variable¶
Subplots supported by Records.
Config({})
Returns Records._subplots
, which gets (deep) copied upon creation of each instance. Thus, changing this config won't affect the class.
To change subplots, you can either change the config in-place, override this property, or overwrite the instance variable Records._subplots
.
values property¶
Records array.
RecordsWithFields class¶
RecordsWithFields()
Class exposes a read-only class property RecordsWithFields.field_config.
Subclasses
field_config function¶
Field config of ${cls_name}
.
${field_config}