Live benchmark data from CI. Updated after each successful perf run.
| Model | Context | Ratio | Cosine | Compacted (t/s) | Baseline (t/s) | Speedup |
|---|---|---|---|---|---|---|
| Qwen3-8B | 8K | 8x | 0.997 | 9.3 | 5.7 | +63% |
| Qwen3-30B-A3B | 4K | 8x | 0.999 | 20.2 | 13.5 | +50% |
| Qwen3-8B | 4K | 8x | 0.997 | 13.8 | 10.7 | +29% |
| Qwen3-8B | 16K | 8x | 0.999 | 5.3 | 4.8 | +10% |
Logit cosine similarity (select pipeline). Higher is better. Green >= 0.99, yellow >= 0.95.
| Model | 4K/2x | 4K/4x | 4K/8x | 8K/2x | 8K/4x | 8K/8x |
|---|---|---|---|---|---|---|
| Qwen3-8B | 0.999 | 0.999 | 0.997 | 0.999 | 0.998 | 0.997 |
| Qwen3-30B-A3B | 0.999 | 0.999 | 0.999 | - | - | - |
| Qwen3-14B | 0.995 | 0.996 | 0.992 | 0.999 | 0.997 | 0.973 |
| DeepSeek-R1-14B | 0.999 | 0.998 | 0.993 | - | - | - |
No compaction — pure decode speed comparison. Zero regression from compaction code.
64K physical KV cache, 49-57 compaction cycles, 256K effective tokens.
| Platform | CI Workflow | Status |
|---|---|---|
| macOS (Apple Silicon) | modelai-ci | Active |
| macOS (Server smoke) | modelai-server-smoke | Active |
| Windows (MSVC x64) | modelai-ci-windows | Build-only |
| Perf regression | modelai-perf-smoke | Active |
| Upstream sync | modelai-upstream-sync | Weekly |