Options Backtesting Framework Guide

Read this page with Options Backtesting API, Backtesting Data Model, Backtesting Engine Loop, Backtesting Execution Realism, and Historical Options Replay Runbook.

Quick definition: an options backtesting framework is the replay system that turns historical market state into signals, contract selection, fills, rejects, trade logs, and robustness reports without using information from the future.

This knowledge base is written for engineers who want to build or audit a simulator, not for readers looking for a trading recommendation. The core goal is simple: a result should explain what the model knew, which option contract it selected, what quote or bar made the fill plausible, and why any skipped trades were rejected.

The public reference implementation is cutebacktests. It packages an intraday/options runtime, opening-range profile helpers, provider adapters, persistence, and robustness utilities. cute-intraday-option-strats is a narrower strategy package on top of that runtime.

Why this matters

A quick script can produce a PnL curve. A framework has to produce an evidence trail. That difference matters most in options because the tradable instrument is not the underlying signal. The strategy may observe SPY bars, but the simulated trade might be a short-dated call with sparse prints, a wide bid/ask spread, missing quotes near entry, and a contract listing that did not exist in every historical window.

The framework should therefore keep the research question small and auditable. A good run can answer: when did the signal happen, which contracts were listed, why was one contract chosen, what price source created the fill, and what diagnostic would change the promotion decision?

Architecture spine

A credible framework keeps five responsibilities separate:

Layer	Job	Evidence to preserve
Data model	Store point-in-time underlyings, contracts, quotes, trades, bars, and coverage metadata.	Provider request, timestamp window, cache source, and missing-data counters.
Replay engine	Walk the market forward using only information available at each simulated timestamp.	Signal timestamp, entry timestamp, exit timestamp, and skip reason.
Strategy layer	Turn historical state into a setup, direction, and exit policy.	Feature inputs, completed bars, thresholds, and profile parameters.
Instrument expression	Convert an underlying setup into an option contract or structure.	DTE window, strike or delta rank, liquidity filters, selected OCC ticker, and rejects.
Research layer	Run folds, holdouts, diagnostics, and portfolio aggregation without changing simulator rules.	Train/test split, selected profile, failed gates, daily PnL, and artifact manifest.

This separation is the main design rule. A strategy should not know how the option provider paginates. A contract selector should not compute signal features. A research sweep should not rewrite fill semantics mid-run. When those boundaries blur, it becomes hard to tell whether a result came from edge, leakage, or a convenient simulator shortcut.

Minimum viable framework

The smallest useful options framework needs historical contract discovery by simulated date, not today's chain. It needs underlying bars for signal generation and option quotes or bars for execution. It needs a replay loop that makes signal decisions before entry decisions. It needs a contract selector with explicit DTE, moneyness, spread, open-interest, and volume rules. It also needs a fill model that records the side of the market used and rejects untradable conditions.

In cutebacktests, the public shape is intentionally explicit:

bash

from datetime import datetime

from cutebacktests import (
    IntradayOptionsBacktestConfig,
    IntradayOptionsBacktester,
    get_opening_range_profile,
)

profile = get_opening_range_profile("c4_long_only_rr15")
config = IntradayOptionsBacktestConfig(
    ticker="SPY",
    start=datetime(2025, 1, 1),
    end=datetime(2025, 1, 31),
    return_trade_log=True,
    **profile.to_intraday_strategy_kwargs(),
)

The concrete profile can change. The framework standards should not. A profile may be opening-range, VWAP mean reversion, dispersion, or a daily forecast overlay, but the replay rules, contract-selection policy, quote checks, and diagnostics should remain comparable.

Implementation checklist

Check	Why it exists	Common failure
Point-in-time contracts	Prevents stale-contract leakage.	Selecting a contract from today's chain for a past date.
Signal and entry split	Prevents same-bar lookahead.	Using a completed signal bar to fill inside the same bar.
Quote-aware fills	Makes execution assumptions explicit.	Treating midpoint or last price as always executable.
Structured rejects	Turns no-trade cases into evidence.	Collapsing all skips into a generic missing-data state.
Daily portfolio aggregation	Measures actual calendar path risk.	Treating each symbol as a separate fake day.
Artifacts and manifests	Makes the run reproducible.	Publishing a result that cannot be recreated from inputs.

Failure modes to design against

The most common framework failures are not exotic. They are ordinary software boundary failures: the strategy can see one bar too much, the selector pulls a current chain, the fill model quietly changes between profile sweeps, or the report drops rejected trades because they make the summary less attractive. Each of those mistakes can make a weak strategy look more robust than it is.

The fix is to make every boundary explicit. The signal layer emits an intent. The selection layer chooses or rejects a contract. The execution layer prices or rejects the fill. The research layer evaluates the output without changing the simulator. If the result improves, the artifact should show which layer changed.

Private-to-public naming

Some early internal research code used legacy names tied to its original project history. The public extraction uses behavior-based names instead: IntradayOptionsBacktester, IntradayOptionsBacktestConfig, IntradayStrategyConfig, cutebacktests.backtest.intraday_options, and cutebacktests.strategies.intraday. Public docs should use the public names.

A complete framework links the research loop to Backtesting Data Model, Backtesting Engine Loop, Backtesting Execution Realism, and Backtesting Robustness. That gives every strategy family the same vocabulary for signal generation, fill simulation, walk-forward splits, PBO, DSR, drawdown, and promotion gates.

Framework implementation notes

A framework becomes trustworthy when it makes bad assumptions noisy. Store the universe loader, data loader, signal function, contract selector, fill model, portfolio accounting, and report writer as separate interfaces. Each interface needs inputs, outputs, timestamps, and reject reasons. If one module returns a plain number where a timestamped record belongs, the framework is hiding evidence.

The recommended path is Backtesting Data Model, then Backtesting Engine Loop, then Backtesting Test Plan. Use the test plan to prove that current chains cannot enter historical runs, same-bar fills are blocked, incomplete pagination is visible, and quote windows are required for option fills. Framework design is less about elegance than about making research fraud expensive.

Backtesting Framework

Docs