HomeBlogHistorical Options Backtesting: Data, Fills, and Slippage That Actually Matter
Case StudyApril 14, 2026·7 min read

Historical Options Backtesting: Data, Fills, and Slippage That Actually Matter

CuteMarkets

CuteMarkets Team

Research

Historical Options Backtesting: Data, Fills, and Slippage That Actually Matter

Repository reference: cutebacktests

Abstract

Historical options backtesting fails most often at the data layer, not at the signal layer. Traders often think they have done "historical options research" when they have downloaded a chain snapshot, picked the nearest strike, and priced the trade with a clean midpoint. That workflow is fast, but it is not enough for serious inference. A useful historical options backtest usually needs historically correct contracts, timestamped quotes, trade prints, and rules for what happens when liquidity is thin.

This repository reached that conclusion from two directions at once. The first was negative: the March 8 audit showed that even contract-selection cache logic could silently distort the tested instrument if the relevant underlying-price bucket was ignored, as described in Backtesting Framework Issue Summary. The second was constructive: the CuteMarkets earnings research notes in Earnings Options Plays Around Earnings, Condensed lay out the practical historical data stack, including contracts with as_of, historical quotes, trades, aggregates, and expirations.

Question

The right question is not "where do I get old option chains?" The right question is: which data objects are required before a historical options backtest deserves to be called causal?

For serious work, the answer is broader than many traders expect. You need contract discovery that respects historical availability. You need quote or trade data near the actual decision window. You need a pricing rule that says what happens when the midpoint is missing or the spread is wide. You also need to know when the option data API stops and the event data begins. CuteMarkets, for example, can supply contracts, chain snapshots, quotes, trades, and aggregates, but an earnings calendar still has to come from outside the options feed.

Method: What Historical Options Backtesting Actually Needs

I think of historical options backtesting as a sequence of four questions.

First, which contracts existed on the day you claim to be studying? This matters more than it sounds. If your backtest accidentally uses a contract that did not exist on the pre-event date, the whole event study is contaminated. The CuteMarkets examples in the repo solve this with the contracts endpoint and historical as_of, which lets the researcher discover historically correct contracts instead of looking backward from today's chain.

Second, what was the execution surface near the entry time? A chain snapshot can tell you the broad menu of strikes and expiries, but it often does not answer the tradeability question. For that, you need quotes and trades. The CuteMarkets teaser is direct on this point: quotes answer whether the spread is narrow enough to trust, and trades answer whether the market is actually printing near the side of the spread you think you can access.

Third, how are you estimating slippage? The historical data layer should not let you hide this choice. If you assume the midpoint without checking spread width, you are sneaking a fill model into the study without admitting it. In this repo, the broader research process keeps rediscovering that monetization layers are where many strategies become fragile. The same logic applies here. A good stock-level idea can be destroyed by an unrealistic option fill assumption.

Fourth, are you reconstructing the event window correctly? For earnings-style studies, the data stack has to combine three things that live in different places: an external earnings calendar, the post-event expiry surface, and the historically correct contract and quote path around that event. That is why historical options backtesting is usually a data-integration problem before it becomes a strategy problem.

Evidence / Results

The repo already contains a concise API map for this workflow in Earnings Options Plays Around Earnings, Condensed. The key CuteMarkets endpoints used there are:

TaskEndpoint
expirationsGET /v1/tickers/expirations/{ticker}
current chain snapshotGET /v1/options/chain/{ticker}
historical contractsGET /v1/options/contracts?as_of=...
historical tradesGET /v1/options/trades/{options_ticker}
historical quotesGET /v1/options/quotes/{options_ticker}
aggregatesGET /v1/options/aggs/{ticker}/...
open/closeGET /v1/options/open-close/{ticker}/{date}

That endpoint map matters because it corresponds to different research questions. Expirations and chain snapshots tell you which post-event structures are even possible. Contracts with as_of tell you whether the instrument existed at the time. Quotes and trades tell you whether the structure was tradeable. Aggregates and open/close data support event-study PnL reconstruction.

The strongest negative evidence in the repo is the March audit's contract-selection repair. The audit recorded that "different entries on the same day could silently reuse the wrong strike" because the contract selection cache ignored the underlying-price bucket used for moneyness ranking. This is exactly the kind of bug that chain-only research will miss. The signal may be the same, the date may be the same, and the instrument under test can still be wrong.

Example Code

The CuteMarkets teaser already includes a larger client, but the minimum useful example for historical options backtesting is the combination of historical contract discovery and a simple quote-quality check:

from urllib.parse import quote
import requests

BASE_URL = "https://api.cutemarkets.com/v1"
HEADERS = {"Authorization": "Bearer YOUR_API_KEY"}


def get(path: str, params: dict | None = None) -> dict:
    response = requests.get(
        f"{BASE_URL}{path}",
        headers=HEADERS,
        params=params,
        timeout=30,
    )
    response.raise_for_status()
    return response.json()


def list_contracts_as_of(underlying: str, as_of: str, expiration_date: str) -> list[dict]:
    payload = get(
        "/options/contracts/",
        params={
            "underlying_ticker": underlying,
            "as_of": as_of,
            "expiration_date": expiration_date,
            "limit": 1000,
        },
    )
    return payload["results"]


def get_quotes(option_ticker: str, day: str) -> list[dict]:
    encoded = quote(option_ticker, safe="")
    payload = get(
        f"/options/quotes/{encoded}/",
        params={"timestamp.gte": f"{day}T09:30:00Z", "timestamp.lt": f"{day}T16:00:00Z"},
    )
    return payload["results"]


def mean_relative_spread(quotes: list[dict]) -> float:
    spreads = []
    for q in quotes:
        bid = q.get("bid_price")
        ask = q.get("ask_price")
        if bid is None or ask is None or ask <= 0:
            continue
        mid = (bid + ask) / 2.0
        if mid > 0:
            spreads.append((ask - bid) / mid)
    return sum(spreads) / len(spreads) if spreads else float("nan")

This is not a complete backtester. It is enough to make one important point. Historical options backtesting means finding the contract that existed then and deciding whether the quotes and spreads make the trade believable.

What Worked

What worked in this repo was the growing insistence on data-layer causality. The CuteMarkets examples make that constructive path visible. They show how to estimate an implied move from the post-earnings chain, inspect quotes and trades for execution quality, and reconstruct historical structures with as_of contracts. That is a serious workflow because it tries to match the data object to the research question instead of recycling one snapshot endpoint for everything.

The broader framework audit supports the same conclusion from the opposite side. Once the contract-selection path was corrected and the portfolio metrics were computed properly, the repo could distinguish between strategies that still had life and strategies that had mostly been flattered by the machinery. That is what you want from historical options backtesting. You want the data layer to reduce false positives, even if that means a smaller opportunity set.

What Failed

What failed was the belief that a historical chain snapshot is enough. It is not enough for entry timing, because it does not tell you what the tradeable spread looked like in the actual decision window. It is not enough for contract validity, because it does not tell you what existed on the historical date unless your provider explicitly supports time-correct discovery. It is not enough for slippage, because a midpoint assumption can hide an enormous amount of execution optimism.

Another recurring failure mode is to treat event studies as if the options feed contains the whole event definition. The CuteMarkets note is explicit that the options API provides the market-data surface, not the earnings timestamp itself. If a researcher forgets that boundary, the study can be internally neat and externally wrong. The contract reconstruction may be correct, while the event alignment is not.

The same problem appears in intraday research when the data model is too thin. The repo's March audit is a useful reminder that even a correctly timestamped signal can become misleading if the contract selected is wrong or if combined risk is estimated from the wrong object. Historical options backtesting has to be causal all the way down. Partial realism is still a weak foundation.

Takeaway

Historical options backtesting requires more than old chains. It requires historically correct contracts, credible quote and trade context, an explicit slippage rule, and event alignment that comes from the right source. The CuteMarkets examples in this repo provide a good reference implementation for the data stack, while the March framework audit provides the negative proof of what breaks when the data layer is too casual.

If you want the bigger framework question, What Is Realistic Options Backtesting? A Practical Guide for Serious Traders explains why realism has to start at the simulator level. If you want to see what happens after the backtest leaves research mode, Backtest vs Paper Trading: Why Good Trading Results Break in Live Markets covers the next failure surface. Join the research log to get the next backtest and failure report.