base module¶
Base data class.
Class Data allows storing, downloading, updating, and managing data. It stores data as a dictionary of Series/DataFrames keyed by symbol, and makes sure that all pandas objects have the same index and columns by aligning them.
Downloading¶
Data can be downloaded by overriding the Data.download_symbol() class method. What Data does under the hood is iterating over all symbols and calling this method.
Let's create a simple data class RandomData
that generates price based on random returns with provided mean and standard deviation:
>>> import numpy as np
>>> import pandas as pd
>>> import vectorbt as vbt
>>> class RandomData(vbt.Data):
... @classmethod
... def download_symbol(cls, symbol, mean=0., stdev=0.1, start_value=100,
... start_dt='2021-01-01', end_dt='2021-01-10'):
... index = pd.date_range(start_dt, end_dt)
... rand_returns = np.random.normal(mean, stdev, size=len(index))
... rand_price = start_value + np.cumprod(rand_returns + 1)
... return pd.Series(rand_price, index=index)
>>> rand_data = RandomData.download(['RANDNX1', 'RANDNX2'])
>>> rand_data.get()
symbol RANDNX1 RANDNX2
2021-01-01 101.042956 100.920462
2021-01-02 100.987327 100.956455
2021-01-03 101.022333 100.955128
2021-01-04 101.084243 100.791793
2021-01-05 101.158619 100.781000
2021-01-06 101.172688 100.786198
2021-01-07 101.311609 100.848192
2021-01-08 101.331841 100.861500
2021-01-09 101.440530 100.944935
2021-01-10 101.585689 100.993223
To provide different keyword arguments for different symbols, we can use symbol_dict:
>>> start_value = vbt.symbol_dict({'RANDNX2': 200})
>>> rand_data = RandomData.download(['RANDNX1', 'RANDNX2'], start_value=start_value)
>>> rand_data.get()
symbol RANDNX1 RANDNX2
2021-01-01 101.083324 200.886078
2021-01-02 101.113405 200.791934
2021-01-03 101.169194 200.852877
2021-01-04 101.164033 200.820111
2021-01-05 101.326248 201.060448
2021-01-06 101.394482 200.876984
2021-01-07 101.494227 200.845519
2021-01-08 101.422012 200.963474
2021-01-09 101.493162 200.790369
2021-01-10 101.606052 200.752296
In case two symbols have different index or columns, they are automatically aligned based on missing_index
and missing_columns
respectively (see data
in settings):
>>> start_dt = vbt.symbol_dict({'RANDNX2': '2021-01-03'})
>>> end_dt = vbt.symbol_dict({'RANDNX2': '2021-01-07'})
>>> rand_data = RandomData.download(
... ['RANDNX1', 'RANDNX2'], start_value=start_value,
... start_dt=start_dt, end_dt=end_dt)
>>> rand_data.get()
symbol RANDNX1 RANDNX2
2021-01-01 101.028054 NaN
2021-01-02 101.032090 NaN
2021-01-03 101.038531 200.936283
2021-01-04 101.068265 200.926764
2021-01-05 100.878492 200.898898
2021-01-06 100.857444 200.922368
2021-01-07 100.933123 200.987094
2021-01-08 100.938034 NaN
2021-01-09 101.044736 NaN
2021-01-10 101.098133 NaN
Updating¶
Updating can be implemented by overriding the Data.update_symbol() instance method, which takes the same arguments as Data.download_symbol(). In contrast to the download method, the update method is an instance method and can access the data downloaded earlier. It can also access the keyword arguments initially passed to the download method, accessible under Data.download_kwargs. Those arguments can be used as default arguments and overridden by arguments passed directly to the update method, using merge_dicts().
Let's define an update method that updates the latest data point and adds two news data points. Note that updating data always returns a new Data instance.
>>> from datetime import timedelta
>>> from vectorbt.utils.config import merge_dicts
>>> class RandomData(vbt.Data):
... @classmethod
... def download_symbol(cls, symbol, mean=0., stdev=0.1, start_value=100,
... start_dt='2021-01-01', end_dt='2021-01-10'):
... index = pd.date_range(start_dt, end_dt)
... rand_returns = np.random.normal(mean, stdev, size=len(index))
... rand_price = start_value + np.cumprod(rand_returns + 1)
... return pd.Series(rand_price, index=index)
...
... def update_symbol(self, symbol, **kwargs):
... download_kwargs = self.select_symbol_kwargs(symbol, self.download_kwargs)
... download_kwargs['start_dt'] = self.data[symbol].index[-1]
... download_kwargs['end_dt'] = download_kwargs['start_dt'] + timedelta(days=2)
... kwargs = merge_dicts(download_kwargs, kwargs)
... return self.download_symbol(symbol, **kwargs)
>>> rand_data = RandomData.download(['RANDNX1', 'RANDNX2'], end_dt='2021-01-05')
>>> rand_data.get()
symbol RANDNX1 RANDNX2
2021-01-01 100.956601 100.970865
2021-01-02 100.919011 100.987026
2021-01-03 101.062733 100.835376
2021-01-04 100.960535 100.820817
2021-01-05 100.834387 100.866549
>>> rand_data = rand_data.update()
>>> rand_data.get()
symbol RANDNX1 RANDNX2
2021-01-01 100.956601 100.970865
2021-01-02 100.919011 100.987026
2021-01-03 101.062733 100.835376
2021-01-04 100.960535 100.820817
2021-01-05 101.011255 100.887049 < updated from here
2021-01-06 101.004149 100.808410
2021-01-07 101.023673 100.714583
>>> rand_data = rand_data.update()
>>> rand_data.get()
symbol RANDNX1 RANDNX2
2021-01-01 100.956601 100.970865
2021-01-02 100.919011 100.987026
2021-01-03 101.062733 100.835376
2021-01-04 100.960535 100.820817
2021-01-05 101.011255 100.887049
2021-01-06 101.004149 100.808410
2021-01-07 100.883400 100.874922 < updated from here
2021-01-08 101.011738 100.780188
2021-01-09 100.912639 100.934014
Merging¶
You can merge symbols from different Data instances either by subclassing Data and defining custom download and update methods, or by manually merging their data dicts into one data dict and passing it to the Data.from_data() class method.
>>> rand_data1 = RandomData.download('RANDNX1', mean=0.2)
>>> rand_data2 = RandomData.download('RANDNX2', start_value=200, start_dt='2021-01-05')
>>> merged_data = vbt.Data.from_data(vbt.merge_dicts(rand_data1.data, rand_data2.data))
>>> merged_data.get()
symbol RANDNX1 RANDNX2
2021-01-01 101.160718 NaN
2021-01-02 101.421020 NaN
2021-01-03 101.959176 NaN
2021-01-04 102.076670 NaN
2021-01-05 102.447234 200.916198
2021-01-06 103.195187 201.033907
2021-01-07 103.595915 200.908229
2021-01-08 104.332550 201.000497
2021-01-09 105.159708 201.019157
2021-01-10 106.729495 200.910210
Indexing¶
Like any other class subclassing Wrapping, we can do pandas indexing on a Data instance, which forwards indexing operation to each Series/DataFrame:
>>> rand_data.loc['2021-01-07':'2021-01-09']
<__main__.RandomData at 0x7fdba4e36198>
>>> rand_data.loc['2021-01-07':'2021-01-09'].get()
symbol RANDNX1 RANDNX2
2021-01-07 100.883400 100.874922
2021-01-08 101.011738 100.780188
2021-01-09 100.912639 100.934014
Saving and loading¶
Like any other class subclassing Pickleable, we can save a Data instance to the disk with Pickleable.save() and load it with Pickleable.load():
>>> rand_data.save('rand_data')
>>> rand_data = RandomData.load('rand_data')
>>> rand_data.get()
symbol RANDNX1 RANDNX2
2021-01-01 100.956601 100.970865
2021-01-02 100.919011 100.987026
2021-01-03 101.062733 100.835376
2021-01-04 100.960535 100.820817
2021-01-05 101.011255 100.887049
2021-01-06 101.004149 100.808410
2021-01-07 100.883400 100.874922
2021-01-08 101.011738 100.780188
2021-01-09 100.912639 100.934014
Stats¶
Hint
See StatsBuilderMixin.stats() and Data.metrics.
>>> rand_data = RandomData.download(['RANDNX1', 'RANDNX2'])
>>> rand_data.stats(column='a')
Start 2021-01-01 00:00:00+00:00
End 2021-01-10 00:00:00+00:00
Period 10 days 00:00:00
Total Symbols 2
Null Counts: RANDNX1 0
Null Counts: RANDNX2 0
dtype: object
StatsBuilderMixin.stats() also supports (re-)grouping:
>>> rand_data.stats(group_by=True)
Start 2021-01-01 00:00:00+00:00
End 2021-01-10 00:00:00+00:00
Period 10 days 00:00:00
Total Symbols 2
Null Counts: RANDNX1 0
Null Counts: RANDNX2 0
Name: group, dtype: object
Plots¶
Hint
See PlotsBuilderMixin.plots() and Data.subplots.
Data class has a single subplot based on Data.plot():
>>> rand_data.plots(settings=dict(base=100)).show_svg()
Data class¶
Data(
wrapper,
data,
tz_localize,
tz_convert,
missing_index,
missing_columns,
download_kwargs,
**kwargs
)
Class that downloads, updates, and manages data coming from a data source.
Superclasses
- AttrResolver
- Configured
- Documented
- IndexingBase
- PandasIndexer
- Pickleable
- PlotsBuilderMixin
- StatsBuilderMixin
- Wrapping
Inherited members
- AttrResolver.deep_getattr()
- AttrResolver.post_resolve_attr()
- AttrResolver.pre_resolve_attr()
- AttrResolver.resolve_attr()
- Configured.copy()
- Configured.dumps()
- Configured.loads()
- Configured.replace()
- Configured.to_doc()
- Configured.update_config()
- PandasIndexer.xs()
- Pickleable.load()
- Pickleable.save()
- PlotsBuilderMixin.build_subplots_doc()
- PlotsBuilderMixin.override_subplots_doc()
- PlotsBuilderMixin.plots()
- StatsBuilderMixin.build_metrics_doc()
- StatsBuilderMixin.override_metrics_doc()
- StatsBuilderMixin.stats()
- Wrapping.config
- Wrapping.iloc
- Wrapping.indexing_kwargs
- Wrapping.loc
- Wrapping.regroup()
- Wrapping.resolve_self()
- Wrapping.select_one()
- Wrapping.select_one_from_obj()
- Wrapping.self_aliases
- Wrapping.wrapper
- Wrapping.writeable_attrs
Subclasses
align_columns class method¶
Data.align_columns(
data,
missing='raise'
)
Align data to have the same columns.
See Data.align_index() for missing
.
align_index class method¶
Data.align_index(
data,
missing='nan'
)
Align data to have the same index.
The argument missing
accepts the following values:
- 'nan': set missing data points to NaN
- 'drop': remove missing data points
- 'raise': raise an error
concat method¶
Data.concat(
level_name='symbol'
)
Return a dict of Series/DataFrames with symbols as columns, keyed by column name.
data property¶
Data dictionary keyed by symbol.
download class method¶
Data.download(
symbols,
tz_localize=None,
tz_convert=None,
missing_index=None,
missing_columns=None,
wrapper_kwargs=None,
**kwargs
)
Download data using Data.download_symbol().
Args
symbols
:hashable
orsequence
ofhashable
-
One or multiple symbols.
Note
Tuple is considered as a single symbol (since hashable).
tz_localize
:any
- See Data.from_data().
tz_convert
:any
- See Data.from_data().
missing_index
:str
- See Data.from_data().
missing_columns
:str
- See Data.from_data().
wrapper_kwargs
:dict
- See Data.from_data().
**kwargs
-
Passed to Data.download_symbol().
If two symbols require different keyword arguments, pass symbol_dict for each argument.
download_kwargs property¶
Keyword arguments initially passed to Data.download_symbol().
download_symbol class method¶
Data.download_symbol(
symbol,
**kwargs
)
Abstract method to download a symbol.
from_data class method¶
Data.from_data(
data,
tz_localize=None,
tz_convert=None,
missing_index=None,
missing_columns=None,
wrapper_kwargs=None,
**kwargs
)
Create a new Data instance from (aligned) data.
Args
data
:dict
- Dictionary of array-like objects keyed by symbol.
tz_localize
:timezone_like
-
If the index is tz-naive, convert to a timezone.
See to_timezone().
tz_convert
:timezone_like
-
Convert the index from one timezone to another.
See to_timezone().
missing_index
:str
- See Data.align_index().
missing_columns
:str
- See Data.align_columns().
wrapper_kwargs
:dict
- Keyword arguments passed to ArrayWrapper.
**kwargs
- Keyword arguments passed to the
__init__
method.
For defaults, see data
in settings.
get method¶
Data.get(
column=None,
**kwargs
)
Get column data.
If one symbol, returns data for that symbol. If multiple symbols, performs concatenation first and returns a DataFrame if one column and a tuple of DataFrames if a list of columns passed.
indexing_func method¶
Data.indexing_func(
pd_indexing_func,
**kwargs
)
Perform indexing on Data.
metrics class variable¶
Metrics supported by Data.
Config({
"start": {
"title": "Start",
"calc_func": "<function Data.<lambda> at 0x11996dda0>",
"agg_func": null,
"tags": "wrapper"
},
"end": {
"title": "End",
"calc_func": "<function Data.<lambda> at 0x11996de40>",
"agg_func": null,
"tags": "wrapper"
},
"period": {
"title": "Period",
"calc_func": "<function Data.<lambda> at 0x11996dee0>",
"apply_to_timedelta": true,
"agg_func": null,
"tags": "wrapper"
},
"total_symbols": {
"title": "Total Symbols",
"calc_func": "<function Data.<lambda> at 0x11996df80>",
"agg_func": null,
"tags": "data"
},
"null_counts": {
"title": "Null Counts",
"calc_func": "<function Data.<lambda> at 0x11996e020>",
"tags": "data"
}
})
Returns Data._metrics
, which gets (deep) copied upon creation of each instance. Thus, changing this config won't affect the class.
To change metrics, you can either change the config in-place, override this property, or overwrite the instance variable Data._metrics
.
missing_columns property¶
missing_columns
initially passed to Data.download_symbol().
missing_index property¶
missing_index
initially passed to Data.download_symbol().
plot method¶
Data.plot(
column=None,
base=None,
**kwargs
)
Plot orders.
Args
column
:str
- Name of the column to plot.
base
:float
-
Rebase all series of a column to a given intial base.
Note
The column should contain prices.
kwargs
:dict
- Keyword arguments passed to GenericAccessor.plot().
Usage
>>> import vectorbt as vbt
>>> start = '2021-01-01 UTC' # crypto is in UTC
>>> end = '2021-06-01 UTC'
>>> data = vbt.YFData.download(['BTC-USD', 'ETH-USD', 'ADA-USD'], start=start, end=end)
>>> data.plot(column='Close', base=1)
plots_defaults property¶
Defaults for PlotsBuilderMixin.plots().
Merges PlotsBuilderMixin.plots_defaults and data.plots
from settings.
select_symbol_kwargs class method¶
Data.select_symbol_kwargs(
symbol,
kwargs
)
Select keyword arguments belonging to symbol
.
stats_defaults property¶
Defaults for StatsBuilderMixin.stats().
Merges StatsBuilderMixin.stats_defaults and data.stats
from settings.
subplots class variable¶
Subplots supported by Data.
Config({
"plot": {
"check_is_not_grouped": true,
"plot_func": "plot",
"pass_add_trace_kwargs": true,
"tags": "data"
}
})
Returns Data._subplots
, which gets (deep) copied upon creation of each instance. Thus, changing this config won't affect the class.
To change subplots, you can either change the config in-place, override this property, or overwrite the instance variable Data._subplots
.
symbols property¶
List of symbols.
tz_convert property¶
tz_convert
initially passed to Data.download_symbol().
tz_localize property¶
tz_localize
initially passed to Data.download_symbol().
update method¶
Data.update(
**kwargs
)
Update the data using Data.update_symbol().
Args
**kwargs
-
Passed to Data.update_symbol().
If two symbols require different keyword arguments, pass symbol_dict for each argument.
Note
Returns a new Data instance.
update_symbol method¶
Data.update_symbol(
symbol,
**kwargs
)
Abstract method to update a symbol.
MetaData class¶
MetaData(
*args,
**kwargs
)
Meta class that exposes a read-only class property StatsBuilderMixin.metrics
.
Superclasses
- MetaPlotsBuilderMixin
- MetaStatsBuilderMixin
builtins.type
Inherited members
symbol_dict class¶
symbol_dict(
*args,
**kwargs
)
Dict that contains symbols as keys.
Superclasses
builtins.dict