Skip to content

base module

Base data class.

Class Data allows storing, downloading, updating, and managing data. It stores data as a dictionary of Series/DataFrames keyed by symbol, and makes sure that all pandas objects have the same index and columns by aligning them.

Downloading

Data can be downloaded by overriding the Data.download_symbol() class method. What Data does under the hood is iterating over all symbols and calling this method.

Let's create a simple data class RandomData that generates price based on random returns with provided mean and standard deviation:

>>> import numpy as np
>>> import pandas as pd
>>> import vectorbt as vbt

>>> class RandomData(vbt.Data):
...     @classmethod
...     def download_symbol(cls, symbol, mean=0., stdev=0.1, start_value=100,
...                         start_dt='2021-01-01', end_dt='2021-01-10'):
...         index = pd.date_range(start_dt, end_dt)
...         rand_returns = np.random.normal(mean, stdev, size=len(index))
...         rand_price = start_value + np.cumprod(rand_returns + 1)
...         return pd.Series(rand_price, index=index)

>>> rand_data = RandomData.download(['RANDNX1', 'RANDNX2'])
>>> rand_data.get()
symbol         RANDNX1     RANDNX2
2021-01-01  101.042956  100.920462
2021-01-02  100.987327  100.956455
2021-01-03  101.022333  100.955128
2021-01-04  101.084243  100.791793
2021-01-05  101.158619  100.781000
2021-01-06  101.172688  100.786198
2021-01-07  101.311609  100.848192
2021-01-08  101.331841  100.861500
2021-01-09  101.440530  100.944935
2021-01-10  101.585689  100.993223

To provide different keyword arguments for different symbols, we can use symbol_dict:

>>> start_value = vbt.symbol_dict({'RANDNX2': 200})
>>> rand_data = RandomData.download(['RANDNX1', 'RANDNX2'], start_value=start_value)
>>> rand_data.get()
symbol         RANDNX1     RANDNX2
2021-01-01  101.083324  200.886078
2021-01-02  101.113405  200.791934
2021-01-03  101.169194  200.852877
2021-01-04  101.164033  200.820111
2021-01-05  101.326248  201.060448
2021-01-06  101.394482  200.876984
2021-01-07  101.494227  200.845519
2021-01-08  101.422012  200.963474
2021-01-09  101.493162  200.790369
2021-01-10  101.606052  200.752296

In case two symbols have different index or columns, they are automatically aligned based on missing_index and missing_columns respectively (see data in settings):

>>> start_dt = vbt.symbol_dict({'RANDNX2': '2021-01-03'})
>>> end_dt = vbt.symbol_dict({'RANDNX2': '2021-01-07'})
>>> rand_data = RandomData.download(
...     ['RANDNX1', 'RANDNX2'], start_value=start_value,
...     start_dt=start_dt, end_dt=end_dt)
>>> rand_data.get()
symbol         RANDNX1     RANDNX2
2021-01-01  101.028054         NaN
2021-01-02  101.032090         NaN
2021-01-03  101.038531  200.936283
2021-01-04  101.068265  200.926764
2021-01-05  100.878492  200.898898
2021-01-06  100.857444  200.922368
2021-01-07  100.933123  200.987094
2021-01-08  100.938034         NaN
2021-01-09  101.044736         NaN
2021-01-10  101.098133         NaN

Updating

Updating can be implemented by overriding the Data.update_symbol() instance method, which takes the same arguments as Data.download_symbol(). In contrast to the download method, the update method is an instance method and can access the data downloaded earlier. It can also access the keyword arguments initially passed to the download method, accessible under Data.download_kwargs. Those arguments can be used as default arguments and overridden by arguments passed directly to the update method, using merge_dicts().

Let's define an update method that updates the latest data point and adds two news data points. Note that updating data always returns a new Data instance.

>>> from datetime import timedelta
>>> from vectorbt.utils.config import merge_dicts

>>> class RandomData(vbt.Data):
...     @classmethod
...     def download_symbol(cls, symbol, mean=0., stdev=0.1, start_value=100,
...                         start_dt='2021-01-01', end_dt='2021-01-10'):
...         index = pd.date_range(start_dt, end_dt)
...         rand_returns = np.random.normal(mean, stdev, size=len(index))
...         rand_price = start_value + np.cumprod(rand_returns + 1)
...         return pd.Series(rand_price, index=index)
...
...     def update_symbol(self, symbol, **kwargs):
...         download_kwargs = self.select_symbol_kwargs(symbol, self.download_kwargs)
...         download_kwargs['start_dt'] = self.data[symbol].index[-1]
...         download_kwargs['end_dt'] = download_kwargs['start_dt'] + timedelta(days=2)
...         kwargs = merge_dicts(download_kwargs, kwargs)
...         return self.download_symbol(symbol, **kwargs)

>>> rand_data = RandomData.download(['RANDNX1', 'RANDNX2'], end_dt='2021-01-05')
>>> rand_data.get()
symbol         RANDNX1     RANDNX2
2021-01-01  100.956601  100.970865
2021-01-02  100.919011  100.987026
2021-01-03  101.062733  100.835376
2021-01-04  100.960535  100.820817
2021-01-05  100.834387  100.866549

>>> rand_data = rand_data.update()
>>> rand_data.get()
symbol         RANDNX1     RANDNX2
2021-01-01  100.956601  100.970865
2021-01-02  100.919011  100.987026
2021-01-03  101.062733  100.835376
2021-01-04  100.960535  100.820817
2021-01-05  101.011255  100.887049 < updated from here
2021-01-06  101.004149  100.808410
2021-01-07  101.023673  100.714583

>>> rand_data = rand_data.update()
>>> rand_data.get()
symbol         RANDNX1     RANDNX2
2021-01-01  100.956601  100.970865
2021-01-02  100.919011  100.987026
2021-01-03  101.062733  100.835376
2021-01-04  100.960535  100.820817
2021-01-05  101.011255  100.887049
2021-01-06  101.004149  100.808410
2021-01-07  100.883400  100.874922 < updated from here
2021-01-08  101.011738  100.780188
2021-01-09  100.912639  100.934014

Merging

You can merge symbols from different Data instances either by subclassing Data and defining custom download and update methods, or by manually merging their data dicts into one data dict and passing it to the Data.from_data() class method.

>>> rand_data1 = RandomData.download('RANDNX1', mean=0.2)
>>> rand_data2 = RandomData.download('RANDNX2', start_value=200, start_dt='2021-01-05')
>>> merged_data = vbt.Data.from_data(vbt.merge_dicts(rand_data1.data, rand_data2.data))
>>> merged_data.get()
symbol         RANDNX1     RANDNX2
2021-01-01  101.160718         NaN
2021-01-02  101.421020         NaN
2021-01-03  101.959176         NaN
2021-01-04  102.076670         NaN
2021-01-05  102.447234  200.916198
2021-01-06  103.195187  201.033907
2021-01-07  103.595915  200.908229
2021-01-08  104.332550  201.000497
2021-01-09  105.159708  201.019157
2021-01-10  106.729495  200.910210

Indexing

Like any other class subclassing Wrapping, we can do pandas indexing on a Data instance, which forwards indexing operation to each Series/DataFrame:

>>> rand_data.loc['2021-01-07':'2021-01-09']
<__main__.RandomData at 0x7fdba4e36198>

>>> rand_data.loc['2021-01-07':'2021-01-09'].get()
symbol         RANDNX1     RANDNX2
2021-01-07  100.883400  100.874922
2021-01-08  101.011738  100.780188
2021-01-09  100.912639  100.934014

Saving and loading

Like any other class subclassing Pickleable, we can save a Data instance to the disk with Pickleable.save() and load it with Pickleable.load():

>>> rand_data.save('rand_data')
>>> rand_data = RandomData.load('rand_data')
>>> rand_data.get()
symbol         RANDNX1     RANDNX2
2021-01-01  100.956601  100.970865
2021-01-02  100.919011  100.987026
2021-01-03  101.062733  100.835376
2021-01-04  100.960535  100.820817
2021-01-05  101.011255  100.887049
2021-01-06  101.004149  100.808410
2021-01-07  100.883400  100.874922
2021-01-08  101.011738  100.780188
2021-01-09  100.912639  100.934014

Stats

>>> rand_data = RandomData.download(['RANDNX1', 'RANDNX2'])

>>> rand_data.stats(column='a')
Start                   2021-01-01 00:00:00+00:00
End                     2021-01-10 00:00:00+00:00
Period                           10 days 00:00:00
Total Symbols                                   2
Null Counts: RANDNX1                            0
Null Counts: RANDNX2                            0
dtype: object

StatsBuilderMixin.stats() also supports (re-)grouping:

>>> rand_data.stats(group_by=True)
Start                   2021-01-01 00:00:00+00:00
End                     2021-01-10 00:00:00+00:00
Period                           10 days 00:00:00
Total Symbols                                   2
Null Counts: RANDNX1                            0
Null Counts: RANDNX2                            0
Name: group, dtype: object

Plots

Data class has a single subplot based on Data.plot():

>>> rand_data.plots(settings=dict(base=100)).show_svg()


Data class

Data(
    wrapper,
    data,
    tz_localize,
    tz_convert,
    missing_index,
    missing_columns,
    download_kwargs,
    **kwargs
)

Class that downloads, updates, and manages data coming from a data source.

Superclasses

Inherited members

Subclasses


align_columns class method

Data.align_columns(
    data,
    missing='raise'
)

Align data to have the same columns.

See Data.align_index() for missing.


align_index class method

Data.align_index(
    data,
    missing='nan'
)

Align data to have the same index.

The argument missing accepts the following values:

  • 'nan': set missing data points to NaN
  • 'drop': remove missing data points
  • 'raise': raise an error

concat method

Data.concat(
    level_name='symbol'
)

Return a dict of Series/DataFrames with symbols as columns, keyed by column name.


data property

Data dictionary keyed by symbol.


download class method

Data.download(
    symbols,
    tz_localize=None,
    tz_convert=None,
    missing_index=None,
    missing_columns=None,
    wrapper_kwargs=None,
    **kwargs
)

Download data using Data.download_symbol().

Args

symbols : hashable or sequence of hashable

One or multiple symbols.

Note

Tuple is considered as a single symbol (since hashable).

tz_localize : any
See Data.from_data().
tz_convert : any
See Data.from_data().
missing_index : str
See Data.from_data().
missing_columns : str
See Data.from_data().
wrapper_kwargs : dict
See Data.from_data().
**kwargs

Passed to Data.download_symbol().

If two symbols require different keyword arguments, pass symbol_dict for each argument.


download_kwargs property

Keyword arguments initially passed to Data.download_symbol().


download_symbol class method

Data.download_symbol(
    symbol,
    **kwargs
)

Abstract method to download a symbol.


from_data class method

Data.from_data(
    data,
    tz_localize=None,
    tz_convert=None,
    missing_index=None,
    missing_columns=None,
    wrapper_kwargs=None,
    **kwargs
)

Create a new Data instance from (aligned) data.

Args

data : dict
Dictionary of array-like objects keyed by symbol.
tz_localize : timezone_like

If the index is tz-naive, convert to a timezone.

See to_timezone().

tz_convert : timezone_like

Convert the index from one timezone to another.

See to_timezone().

missing_index : str
See Data.align_index().
missing_columns : str
See Data.align_columns().
wrapper_kwargs : dict
Keyword arguments passed to ArrayWrapper.
**kwargs
Keyword arguments passed to the __init__ method.

For defaults, see data in settings.


get method

Data.get(
    column=None,
    **kwargs
)

Get column data.

If one symbol, returns data for that symbol. If multiple symbols, performs concatenation first and returns a DataFrame if one column and a tuple of DataFrames if a list of columns passed.


indexing_func method

Data.indexing_func(
    pd_indexing_func,
    **kwargs
)

Perform indexing on Data.


metrics class variable

Metrics supported by Data.

Config({
    "start": {
        "title": "Start",
        "calc_func": "<function Data.<lambda> at 0x11996dda0>",
        "agg_func": null,
        "tags": "wrapper"
    },
    "end": {
        "title": "End",
        "calc_func": "<function Data.<lambda> at 0x11996de40>",
        "agg_func": null,
        "tags": "wrapper"
    },
    "period": {
        "title": "Period",
        "calc_func": "<function Data.<lambda> at 0x11996dee0>",
        "apply_to_timedelta": true,
        "agg_func": null,
        "tags": "wrapper"
    },
    "total_symbols": {
        "title": "Total Symbols",
        "calc_func": "<function Data.<lambda> at 0x11996df80>",
        "agg_func": null,
        "tags": "data"
    },
    "null_counts": {
        "title": "Null Counts",
        "calc_func": "<function Data.<lambda> at 0x11996e020>",
        "tags": "data"
    }
})

Returns Data._metrics, which gets (deep) copied upon creation of each instance. Thus, changing this config won't affect the class.

To change metrics, you can either change the config in-place, override this property, or overwrite the instance variable Data._metrics.


missing_columns property

missing_columns initially passed to Data.download_symbol().


missing_index property

missing_index initially passed to Data.download_symbol().


plot method

Data.plot(
    column=None,
    base=None,
    **kwargs
)

Plot orders.

Args

column : str
Name of the column to plot.
base : float

Rebase all series of a column to a given intial base.

Note

The column should contain prices.

kwargs : dict
Keyword arguments passed to GenericAccessor.plot().

Usage

>>> import vectorbt as vbt

>>> start = '2021-01-01 UTC'  # crypto is in UTC
>>> end = '2021-06-01 UTC'
>>> data = vbt.YFData.download(['BTC-USD', 'ETH-USD', 'ADA-USD'], start=start, end=end)

>>> data.plot(column='Close', base=1)


plots_defaults property

Defaults for PlotsBuilderMixin.plots().

Merges PlotsBuilderMixin.plots_defaults and data.plots from settings.


select_symbol_kwargs class method

Data.select_symbol_kwargs(
    symbol,
    kwargs
)

Select keyword arguments belonging to symbol.


stats_defaults property

Defaults for StatsBuilderMixin.stats().

Merges StatsBuilderMixin.stats_defaults and data.stats from settings.


subplots class variable

Subplots supported by Data.

Config({
    "plot": {
        "check_is_not_grouped": true,
        "plot_func": "plot",
        "pass_add_trace_kwargs": true,
        "tags": "data"
    }
})

Returns Data._subplots, which gets (deep) copied upon creation of each instance. Thus, changing this config won't affect the class.

To change subplots, you can either change the config in-place, override this property, or overwrite the instance variable Data._subplots.


symbols property

List of symbols.


tz_convert property

tz_convert initially passed to Data.download_symbol().


tz_localize property

tz_localize initially passed to Data.download_symbol().


update method

Data.update(
    **kwargs
)

Update the data using Data.update_symbol().

Args

**kwargs

Passed to Data.update_symbol().

If two symbols require different keyword arguments, pass symbol_dict for each argument.

Note

Returns a new Data instance.


update_symbol method

Data.update_symbol(
    symbol,
    **kwargs
)

Abstract method to update a symbol.


MetaData class

MetaData(
    *args,
    **kwargs
)

Meta class that exposes a read-only class property StatsBuilderMixin.metrics.

Superclasses

Inherited members


symbol_dict class

symbol_dict(
    *args,
    **kwargs
)

Dict that contains symbols as keys.

Superclasses

  • builtins.dict