Skip to content

splitters module

Splitters for cross-validation.

Defines splitter classes similar (but may not compatible) to sklearn.model_selection.BaseCrossValidator.


split_ranges_into_sets function

split_ranges_into_sets(
    start_idxs,
    end_idxs,
    set_lens=(),
    left_to_right=True
)

Generate ranges between each in start_idxs and end_idxs and optionally split into one or more sets.

Args

start_idxs : array_like
Start indices.
end_idxs : array_like
End indices.
set_lens : list of float

Lengths of sets in each range.

The number of returned sets is the length of set_lens plus one, which stores the remaining elements.

Can be passed per range.

left_to_right : bool or list of bool

Whether to resolve set_lens from left to right.

Makes the last set variable, otherwise makes the first set variable.

Can be passed per range.

Usage

  • set_lens=(0.5): 50% in training set, the rest in test set
  • set_lens=(0.5, 0.25): 50% in training set, 25% in validation set, the rest in test set
  • set_lens=(50, 30): 50 in training set, 30 in validation set, the rest in test set
  • set_lens=(50, 30) and left_to_right=False: 30 in test set, 50 in validation set, the rest in training set

BaseSplitter class

BaseSplitter()

Abstract splitter class.

Subclasses


split method

BaseSplitter.split(
    X,
    **kwargs
)

ExpandingSplitter class

ExpandingSplitter()

Expanding walk-forward splitter.

Superclasses


split method

ExpandingSplitter.split(
    X,
    n=None,
    min_len=1,
    **kwargs
)

Similar to RollingSplitter.split(), but expanding.

**kwargs are passed to split_ranges_into_sets().


RangeSplitter class

RangeSplitter()

Range splitter.

Superclasses


split method

RangeSplitter.split(
    X,
    n=None,
    range_len=None,
    min_len=1,
    start_idxs=None,
    end_idxs=None,
    **kwargs
)

Either split into n ranges each range_len long, or split into ranges between start_idxs and end_idxs, and concatenate along the column axis.

At least one of range_len, n, or start_idxs and end_idxs must be set:

  • If range_len is None, are split evenly into n ranges.
  • If n is None, returns the maximum number of ranges of length range_len (can be a percentage).
  • If start_idxs and end_idxs, splits into ranges between both arrays. Both index arrays should be either NumPy arrays with absolute positions or pandas indexes with labels. The last index should be inclusive. The distance between each start and end index can be different, and smaller ranges are filled with NaNs.

range_len can be a floating number between 0 and 1 to indicate a fraction of the total range.

**kwargs are passed to split_ranges_into_sets().


RollingSplitter class

RollingSplitter()

Rolling walk-forward splitter.

Superclasses


split method

RollingSplitter.split(
    X,
    n=None,
    window_len=None,
    min_len=1,
    **kwargs
)

Split by rolling a window.

**kwargs are passed to split_ranges_into_sets().


SplitterT class

SplitterT(
    *args,
    **kwargs
)

Base class for protocol classes. Protocol classes are defined as::

class Proto(Protocol):
    def meth(self) -> int:
        ...

Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing), for example::

class C

def meth(self) -> int:
    return 0

def func(x: Proto) -> int: return x.meth()

func(C()) # Passes static type check

See PEP 544 for details. Protocol classes decorated with @typing_extensions.runtime act as simple-minded runtime protocol that checks only the presence of given attributes, ignoring their type signatures.

Protocol classes can be generic, they are defined as::

class GenProto(Protocol[T]):
    def meth(self) -> T:
        ...

Superclasses

  • typing_extensions.Protocol

split method

SplitterT.split(
    X,
    **kwargs
)