Quick definition: an options backtesting framework is the replay system that turns historical market state into signals, contract selection, fills, rejects, trade logs, and robustness reports without using information from the future.
This knowledge base is written for engineers who want to build or audit a simulator, not for readers looking for a trading recommendation. The core goal is simple: a result should explain what the model knew, which option contract it selected, what quote or bar made the fill plausible, and why any skipped trades were rejected.
The public reference implementation is cutebacktests. It packages an intraday/options runtime, opening-range profile helpers, provider adapters, persistence, and robustness utilities. cute-intraday-option-strats is a narrower strategy package on top of that runtime.
Why this matters
A quick script can produce a PnL curve. A framework has to produce an evidence trail. That difference matters most in options because the tradable instrument is not the underlying signal. The strategy may observe SPY bars, but the simulated trade might be a short-dated call with sparse prints, a wide bid/ask spread, missing quotes near entry, and a contract listing that did not exist in every historical window.
The framework should therefore keep the research question small and auditable. A good run can answer: when did the signal happen, which contracts were listed, why was one contract chosen, what price source created the fill, and what diagnostic would change the promotion decision?
Architecture spine
A credible framework keeps five responsibilities separate:
| Layer | Job | Evidence to preserve |
|---|---|---|
| Data model | Store point-in-time underlyings, contracts, quotes, trades, bars, and coverage metadata. | Provider request, timestamp window, cache source, and missing-data counters. |
| Replay engine | Walk the market forward using only information available at each simulated timestamp. | Signal timestamp, entry timestamp, exit timestamp, and skip reason. |
| Strategy layer | Turn historical state into a setup, direction, and exit policy. | Feature inputs, completed bars, thresholds, and profile parameters. |
| Instrument expression | Convert an underlying setup into an option contract or structure. | DTE window, strike or delta rank, liquidity filters, selected OCC ticker, and rejects. |
| Research layer | Run folds, holdouts, diagnostics, and portfolio aggregation without changing simulator rules. | Train/test split, selected profile, failed gates, daily PnL, and artifact manifest. |
This separation is the main design rule. A strategy should not know how the option provider paginates. A contract selector should not compute signal features. A research sweep should not rewrite fill semantics mid-run. When those boundaries blur, it becomes hard to tell whether a result came from edge, leakage, or a convenient simulator shortcut.
Minimum viable framework
The smallest useful options framework needs historical contract discovery by simulated date, not today's chain. It needs underlying bars for signal generation and option quotes or bars for execution. It needs a replay loop that makes signal decisions before entry decisions. It needs a contract selector with explicit DTE, moneyness, spread, open-interest, and volume rules. It also needs a fill model that records the side of the market used and rejects untradable conditions.
In cutebacktests, the public shape is intentionally explicit:
from datetime import datetime
from cutebacktests import (
IntradayOptionsBacktestConfig,
IntradayOptionsBacktester,
get_opening_range_profile,
)
profile = get_opening_range_profile("c4_long_only_rr15")
config = IntradayOptionsBacktestConfig(
ticker="SPY",
start=datetime(2025, 1, 1),
end=datetime(2025, 1, 31),
return_trade_log=True,
**profile.to_intraday_strategy_kwargs(),
)
The concrete profile can change. The framework standards should not. A profile may be opening-range, VWAP mean reversion, dispersion, or a daily forecast overlay, but the replay rules, contract-selection policy, quote checks, and diagnostics should remain comparable.
Implementation checklist
| Check | Why it exists | Common failure |
|---|---|---|
| Point-in-time contracts | Prevents stale-contract leakage. | Selecting a contract from today's chain for a past date. |
| Signal and entry split | Prevents same-bar lookahead. | Using a completed signal bar to fill inside the same bar. |
| Quote-aware fills | Makes execution assumptions explicit. | Treating midpoint or last price as always executable. |
| Structured rejects | Turns no-trade cases into evidence. | Collapsing all skips into a generic missing-data state. |
| Daily portfolio aggregation | Measures actual calendar path risk. | Treating each symbol as a separate fake day. |
| Artifacts and manifests | Makes the run reproducible. | Publishing a result that cannot be recreated from inputs. |
Failure modes to design against
The most common framework failures are not exotic. They are ordinary software boundary failures: the strategy can see one bar too much, the selector pulls a current chain, the fill model quietly changes between profile sweeps, or the report drops rejected trades because they make the summary less attractive. Each of those mistakes can make a weak strategy look more robust than it is.
The fix is to make every boundary explicit. The signal layer emits an intent. The selection layer chooses or rejects a contract. The execution layer prices or rejects the fill. The research layer evaluates the output without changing the simulator. If the result improves, the artifact should show which layer changed.
Private-to-public naming
Some early internal research code used legacy names tied to its original project history. The public extraction uses behavior-based names instead: IntradayOptionsBacktester, IntradayOptionsBacktestConfig, IntradayStrategyConfig, cutebacktests.backtest.intraday_options, and cutebacktests.strategies.intraday. Public docs should use the public names.