Experimentation framework

Every policy change faces the tribunal of evidence.

M3 work exists to determine whether Finance Feedback Engine improves trade quality. Tests must pass, but passing tests alone does not prove edge.

Feature-gatedReplay/backtest evidenceMetadata requiredKill on hidden spillover

What a good trade means

Expectancy

Positive after costs

Mean realized return should improve after fees, slippage assumptions, and opportunity cost.

Risk

Drawdown stays controlled

Risk-adjusted returns, tail losses, and drawdown behavior must not degrade beyond tolerance.

Context

Local improvement

The target lane, regime, confidence bucket, volatility bucket, and action family should improve without damaging stronger pockets.

Auditability

Reconstructable intervention

Artifacts must record gate, experiment id, matched pocket, score before/after, final action, and reason/source.

Lifecycle

Hypothesis packet: target pocket, baseline, expected direction, allowed spillover, and kill criteria.
Feature-gated intervention: narrow, reversible, bounded, and not a stealth global rewrite.
Offline evidence: baseline versus candidate metrics, target distribution, non-target spillover, and risk readouts.
Live shadow or tiny-budget readout: only after offline evidence passes, with metadata verification.
Decision: promote, tune, or kill.

Closure rule

No M3 experiment closes as merely implemented. It closes when merged source-of-truth behavior can support a decision about edge, safety, or the next measurement step.