Statistical Arbitrage in Crypto: What the Backtests Don't Show

Every statistical arbitrage backtest I've seen from junior quants has the same problem: it assumes the trade executes at the mid. In live markets, you don't trade at the mid. You trade at the ask when you buy and the bid when you sell — and in crypto, that spread can wipe your edge entirely.

This is the first of many gaps between backtested and live performance. Let's go through the ones that matter.

Gap 1: Execution Latency

Your backtest assumes instant execution. Your live system has WebSocket round-trip latency, order placement delay, and exchange processing time. For mean-reversion strategies with tight windows, being 100ms late means the divergence has already collapsed.

The only fix is infrastructure: co-location where possible, optimized network paths, and a realistic latency model baked into your simulation.

Gap 2: Slippage and Market Impact

Liquid pairs on major exchanges have tight spreads. Illiquid pairs — where many arb opportunities actually live — have wide spreads and thin books. Your order moves the market as you fill.

In your backtest, size doesn't matter. In live trading, a 5 BTC order in a thin book hits meaningfully different prices across the fill. Model your own impact or your live Sharpe will always disappoint.

Gap 3: Fee Structures

Maker vs. taker fees change the economics of every strategy. If your arb requires crossing the spread on both legs, you're paying taker fees twice. Whether the edge survives depends entirely on fee tiers and rebate structures — things that don't exist in a naive backtest.

Run your entire backtest again, but with realistic fees applied per execution side. The strategy landscape usually looks very different.

Gap 4: Simultaneous Execution Across Venues

Cross-exchange arb requires legs to execute at roughly the same time. In practice, one leg may fill while the other is pending, leaving you with directional exposure you didn't intend.

This is a systems problem as much as a quant problem. Your execution engine needs to handle partial fills, leg synchronization, and emergency hedge logic for when one side fails.

What Survives

Strategies that survive live deployment tend to share a few properties: edges measured in tens of basis points (not single digits), execution infrastructure that competes on latency, rigorous fee-adjusted modeling, and position sizing that accounts for realistic liquidity.

The bar is higher than the backtest makes it look. That's not a reason to stop. It's a reason to build the infrastructure seriously from day one.