Skip to content

base module

Base class for working with records.

vectorbt works with two different representations of data: matrices and records.

A matrix, in this context, is just an array of one-dimensional arrays, each corresponding to a separate feature. The matrix itself holds only one kind of information (one attribute). For example, one can create a matrix for entry signals, with columns being different strategy configurations. But what if the matrix is huge and sparse? What if there is more information we would like to represent by each element? Creating multiple matrices would be a waste of memory.

Records make possible representing complex, sparse information in a dense format. They are just an array of one-dimensional arrays of fixed schema. You can imagine records being a DataFrame, where each row represents a record and each column represents a specific attribute.

               a     b
         0   1.0   5.0
attr1 =  1   2.0   NaN
         2   NaN   7.0
         3   4.0   8.0
               a     b
         0   9.0  13.0
attr2 =  1  10.0   NaN
         2   NaN  15.0
         3  12.0  16.0
            |
            v
      id  col  idx  attr1  attr2
0      0    0    0      1      9
1      1    0    1      2     10
2      2    0    3      4     12
3      3    1    0      5     13
4      4    1    1      7     15
5      5    1    3      8     16

Another advantage of records is that they are not constrained by size. Multiple records can map to a single element in a matrix. For example, one can define multiple orders at the same time step, which is impossible to represent in a matrix form without using complex data types.

Consider the following example:

>>> import numpy as np
>>> import pandas as pd
>>> from numba import njit
>>> from collections import namedtuple
>>> import vectorbt as vbt

>>> example_dt = np.dtype([
...     ('id', np.int_),
...     ('col', np.int_),
...     ('idx', np.int_),
...     ('some_field', np.float_)
... ])
>>> records_arr = np.array([
...     (0, 0, 0, 10.),
...     (1, 0, 1, 11.),
...     (2, 0, 2, 12.),
...     (3, 1, 0, 13.),
...     (4, 1, 1, 14.),
...     (5, 1, 2, 15.),
...     (6, 2, 0, 16.),
...     (7, 2, 1, 17.),
...     (8, 2, 2, 18.)
... ], dtype=example_dt)
>>> wrapper = vbt.ArrayWrapper(index=['x', 'y', 'z'],
...     columns=['a', 'b', 'c'], ndim=2, freq='1 day')
>>> records = vbt.Records(wrapper, records_arr)

Printing

There are two ways to print records:

  • Raw dataframe that preserves field names and data types:
>>> records.records
   id  col  idx  some_field
0   0    0    0        10.0
1   1    0    1        11.0
2   2    0    2        12.0
3   3    1    0        13.0
4   4    1    1        14.0
5   5    1    2        15.0
6   6    2    0        16.0
7   7    2    1        17.0
8   8    2    2        18.0
>>> records.records_readable
   Id Column Timestamp  some_field
0   0      a         x        10.0
1   1      a         y        11.0
2   2      a         z        12.0
3   3      b         x        13.0
4   4      b         y        14.0
5   5      b         z        15.0
6   6      c         x        16.0
7   7      c         y        17.0
8   8      c         z        18.0

Mapping

Records are just structured arrays with a bunch of methods and properties for processing them. Their main feature is to map the records array and to reduce it by column (similar to the MapReduce paradigm). The main advantage is that it all happens without conversion to the matrix form and wasting memory resources.

Records can be mapped to MappedArray in several ways:

>>> records.map_field('some_field')
<vectorbt.records.mapped_array.MappedArray at 0x7ff49bd31a58>

>>> records.map_field('some_field').values
array([10., 11., 12., 13., 14., 15., 16., 17., 18.])
>>> @njit
... def power_map_nb(record, pow):
...     return record.some_field ** pow

>>> records.map(power_map_nb, 2)
<vectorbt.records.mapped_array.MappedArray at 0x7ff49c990cf8>

>>> records.map(power_map_nb, 2).values
array([100., 121., 144., 169., 196., 225., 256., 289., 324.])
>>> records.map_array(records_arr['some_field'] ** 2)
<vectorbt.records.mapped_array.MappedArray object at 0x7fe9bccf2978>

>>> records.map_array(records_arr['some_field'] ** 2).values
array([100., 121., 144., 169., 196., 225., 256., 289., 324.])
>>> @njit
... def cumsum_apply_nb(records):
...     return np.cumsum(records.some_field)

>>> records.apply(cumsum_apply_nb)
<vectorbt.records.mapped_array.MappedArray at 0x7ff49c990cf8>

>>> records.apply(cumsum_apply_nb).values
array([10., 21., 33., 13., 27., 42., 16., 33., 51.])

>>> group_by = np.array(['first', 'first', 'second'])
>>> records.apply(cumsum_apply_nb, group_by=group_by, apply_per_group=True).values
array([10., 21., 33., 46., 60., 75., 16., 33., 51.])

Notice how cumsum resets at each column in the first example and at each group in the second example.

Filtering

Use Records.apply_mask() to filter elements per column/group:

>>> mask = [True, False, True, False, True, False, True, False, True]
>>> filtered_records = records.apply_mask(mask)
>>> filtered_records.count()
a    2
b    1
c    2
dtype: int64

>>> filtered_records.values['id']
array([0, 2, 4, 6, 8])

Grouping

One of the key features of Records is that you can perform reducing operations on a group of columns as if they were a single column. Groups can be specified by group_by, which can be anything from positions or names of column levels, to a NumPy array with actual groups.

There are multiple ways of define grouping:

>>> group_by = np.array(['first', 'first', 'second'])
>>> grouped_wrapper = wrapper.replace(group_by=group_by)
>>> grouped_records = vbt.Records(grouped_wrapper, records_arr)

>>> grouped_records.map_field('some_field').mean()
first     12.5
second    17.0
dtype: float64
>>> records.regroup(group_by).map_field('some_field').mean()
first     12.5
second    17.0
dtype: float64
  • Pass group_by directly to the mapping method:
>>> records.map_field('some_field', group_by=group_by).mean()
first     12.5
second    17.0
dtype: float64
  • Pass group_by directly to the reducing method:
>>> records.map_field('some_field').mean(group_by=group_by)
a    11.0
b    14.0
c    17.0
dtype: float64

Note

Grouping applies only to reducing operations, there is no change to the arrays.

Indexing

Like any other class subclassing Wrapping, we can do pandas indexing on a Records instance, which forwards indexing operation to each object with columns:

>>> records['a'].records
   id  col  idx  some_field
0   0    0    0        10.0
1   1    0    1        11.0
2   2    0    2        12.0

>>> grouped_records['first'].records
   id  col  idx  some_field
0   0    0    0        10.0
1   1    0    1        11.0
2   2    0    2        12.0
3   3    1    0        13.0
4   4    1    1        14.0
5   5    1    2        15.0

Note

Changing index (time axis) is not supported. The object should be treated as a Series rather than a DataFrame; for example, use some_field.iloc[0] instead of some_field.iloc[:, 0].

Indexing behavior depends solely upon ArrayWrapper. For example, if group_select is enabled indexing will be performed on groups, otherwise on single columns.

Caching

Records supports caching. If a method or a property requires heavy computation, it's wrapped with cached_method() and cached_property respectively. Caching can be disabled globally via caching in settings.

Note

Because of caching, class is meant to be immutable and all properties are read-only. To change any attribute, use the copy method and pass the attribute as keyword argument.

Saving and loading

Like any other class subclassing Pickleable, we can save a Records instance to the disk with Pickleable.save() and load it with Pickleable.load().

Stats

>>> records.stats(column='a')
Start                          x
End                            z
Period           3 days 00:00:00
Total Records                  3
Name: a, dtype: object

StatsBuilderMixin.stats() also supports (re-)grouping:

>>> grouped_records.stats(column='first')
Start                          x
End                            z
Period           3 days 00:00:00
Total Records                  6
Name: first, dtype: object

Plots

This class is too generic to have any subplots, but feel free to add custom subplots to your subclass.

Extending

Records class can be extended by subclassing.

In case some of our fields have the same meaning but different naming (such as the base field idx) or other properties, we can override field_config using override_field_config(). It will look for configs of all base classes and merge our config on top of them. This preserves any base class property that is not explicitly listed in our config.

>>> from vectorbt.records.decorators import override_field_config

>>> my_dt = np.dtype([
...     ('my_id', np.int_),
...     ('my_col', np.int_),
...     ('my_idx', np.int_)
... ])

>>> my_fields_config = dict(
...     dtype=my_dt,
...     settings=dict(
...         id=dict(name='my_id'),
...         col=dict(name='my_col'),
...         idx=dict(name='my_idx')
...     )
... )
>>> @override_field_config(my_fields_config)
... class MyRecords(vbt.Records):
...     pass

>>> records_arr = np.array([
...     (0, 0, 0),
...     (1, 0, 1),
...     (2, 1, 0),
...     (3, 1, 1)
... ], dtype=my_dt)
>>> wrapper = vbt.ArrayWrapper(index=['x', 'y'],
...     columns=['a', 'b'], ndim=2, freq='1 day')
>>> my_records = MyRecords(wrapper, records_arr)

>>> my_records.id_arr
array([0, 1, 2, 3])

>>> my_records.col_arr
array([0, 0, 1, 1])

>>> my_records.idx_arr
array([0, 1, 0, 1])

Alternatively, we can override the _field_config class attribute.

>>> @override_field_config
... class MyRecords(vbt.Records):
...     _field_config = dict(
...         dtype=my_dt,
...         settings=dict(
...             id=dict(name='my_id'),
...             idx=dict(name='my_idx'),
...             col=dict(name='my_col')
...         )
...     )

Note

Don't forget to decorate the class with @override_field_config to inherit configs from base classes.

You can stop inheritance by not decorating or passing merge_configs=False to the decorator.


MetaFields class

MetaFields(
    *args,
    **kwargs
)

Meta class that exposes a read-only class property MetaFields.field_config.

Superclasses

  • builtins.type

Subclasses


field_config property

Field config.


MetaRecords class

MetaRecords(
    *args,
    **kwargs
)

Meta class that exposes a read-only class property StatsBuilderMixin.metrics.

Superclasses

Inherited members


Records class

Records(
    wrapper,
    records_arr,
    col_mapper=None,
    **kwargs
)

Wraps the actual records array (such as trades) and exposes methods for mapping it to some array of values (such as PnL of each trade).

Args

wrapper : ArrayWrapper

Array wrapper.

See ArrayWrapper.

records_arr : array_like

A structured NumPy array of records.

Must have the fields id (record index) and col (column index).

col_mapper : ColumnMapper

Column mapper if already known.

Note

It depends on records_arr, so make sure to invalidate col_mapper upon creating a Records instance with a modified records_arr.

Records.replace() does it automatically.

**kwargs

Custom keyword arguments passed to the config.

Useful if any subclass wants to extend the config.

Superclasses

Inherited members

Subclasses


apply method

Records.apply(
    apply_func_nb,
    *args,
    group_by=None,
    apply_per_group=False,
    dtype=None,
    **kwargs
)

Apply function on records per column/group. Returns mapped array.

Applies per group if apply_per_group is True.

See apply_on_records_nb().

**kwargs are passed to Records.map_array().


apply_mask method

Records.apply_mask(
    mask,
    group_by=None,
    **kwargs
)

Return a new class instance, filtered by mask.


build_field_config_doc class method

Records.build_field_config_doc(
    source_cls=None
)

Build field config documentation.


col_arr property

Get column array.


col_mapper property

Column mapper.

See ColumnMapper.


count method

Records.count(
    group_by=None,
    wrap_kwargs=None
)

Return count by column.


field_config class variable

Field config of Records.

Config({
    "dtype": null,
    "settings": {
        "id": {
            "name": "id",
            "title": "Id"
        },
        "col": {
            "name": "col",
            "title": "Column",
            "mapping": "columns"
        },
        "idx": {
            "name": "idx",
            "title": "Timestamp",
            "mapping": "index"
        }
    }
})

get_apply_mapping_arr method

Records.get_apply_mapping_arr(
    field,
    **kwargs
)

Resolve the mapped array on the field, with mapping applied. Uses Records.field_config.


get_by_col_idxs method

Records.get_by_col_idxs(
    col_idxs
)

Get records corresponding to column indices.

Returns new records array.


get_field_arr method

Records.get_field_arr(
    field
)

Resolve the array of the field. Uses Records.field_config.


get_field_mapping method

Records.get_field_mapping(
    field
)

Resolve the mapping of the field. Uses Records.field_config.


get_field_name method

Records.get_field_name(
    field
)

Resolve the name of the field. Uses Records.field_config..


get_field_setting method

Records.get_field_setting(
    field,
    setting,
    default=None
)

Resolve any setting of the field. Uses Records.field_config.


get_field_title method

Records.get_field_title(
    field
)

Resolve the title of the field. Uses Records.field_config.


get_map_field method

Records.get_map_field(
    field,
    **kwargs
)

Resolve the mapped array of the field. Uses Records.field_config.


get_map_field_to_index method

Records.get_map_field_to_index(
    field,
    **kwargs
)

Resolve the mapped array on the field, with index applied. Uses Records.field_config.


id_arr property

Get id array.


idx_arr property

Get index array.


indexing_func method

Records.indexing_func(
    pd_indexing_func,
    **kwargs
)

Perform indexing on Records.


indexing_func_meta method

Records.indexing_func_meta(
    pd_indexing_func,
    **kwargs
)

Perform indexing on Records and return metadata.


is_sorted method

Records.is_sorted(
    incl_id=False
)

Check whether records are sorted.


map method

Records.map(
    map_func_nb,
    *args,
    dtype=None,
    **kwargs
)

Map each record to a scalar value. Returns mapped array.

See map_records_nb().

**kwargs are passed to Records.map_array().


map_array method

Records.map_array(
    a,
    idx_arr=None,
    mapping=None,
    group_by=None,
    **kwargs
)

Convert array to mapped array.

The length of the array should match that of the records.


map_field method

Records.map_field(
    field,
    **kwargs
)

Convert field to mapped array.

**kwargs are passed to Records.map_array().


metrics class variable

Metrics supported by Records.

Config({
    "start": {
        "title": "Start",
        "calc_func": "<function Records.<lambda> at 0x7f9549873160>",
        "agg_func": null,
        "tags": "wrapper"
    },
    "end": {
        "title": "End",
        "calc_func": "<function Records.<lambda> at 0x7f95498731f0>",
        "agg_func": null,
        "tags": "wrapper"
    },
    "period": {
        "title": "Period",
        "calc_func": "<function Records.<lambda> at 0x7f9549873280>",
        "apply_to_timedelta": true,
        "agg_func": null,
        "tags": "wrapper"
    },
    "count": {
        "title": "Count",
        "calc_func": "count",
        "tags": "records"
    }
})

Returns Records._metrics, which gets (deep) copied upon creation of each instance. Thus, changing this config won't affect the class.

To change metrics, you can either change the config in-place, override this property, or overwrite the instance variable Records._metrics.


override_field_config_doc class method

Records.override_field_config_doc(
    __pdoc__,
    source_cls=None
)

Call this method on each subclass that overrides field_config.


plots_defaults property

Defaults for PlotsBuilderMixin.plots().

Merges PlotsBuilderMixin.plots_defaults and records.plots from settings.


recarray property


records property

Records.


records_arr property

Records array.


records_readable property

Records in readable format.


replace method

Records.replace(
    **kwargs
)

See Configured.replace().

Also, makes sure that Records.col_mapper is not passed to the new instance.


sort method

Records.sort(
    incl_id=False,
    group_by=None,
    **kwargs
)

Sort records by columns (primary) and ids (secondary, optional).

Note

Sorting is expensive. A better approach is to append records already in the correct order.


stats_defaults property

Defaults for StatsBuilderMixin.stats().

Merges StatsBuilderMixin.stats_defaults and records.stats from settings.


subplots class variable

Subplots supported by Records.

Config({})

Returns Records._subplots, which gets (deep) copied upon creation of each instance. Thus, changing this config won't affect the class.

To change subplots, you can either change the config in-place, override this property, or overwrite the instance variable Records._subplots.


values property

Records array.


RecordsWithFields class

RecordsWithFields()

Class exposes a read-only class property RecordsWithFields.field_config.

Subclasses


field_config function

Field config of ${cls_name}.

${field_config}