QA Transparency

Every merge to modelai-main is gated by adversarial review, CI tests, and benchmark verification. This page documents the process.

Adversarial Review Protocol

Every implementation slice goes through a 13-section hostile review before merge. The reviewer's job is to try to break the code, not confirm it looks reasonable.

Mandatory Review Sections

#SectionWhat It Checks
0Scope GateChanges stay within declared scope
1Spec MatchCode matches plan exactly (no contract drift)
2Contract BoundaryCaller/callee assumptions match
3Concrete Traces6 mandatory traces with real values
4Multi-Variant ModelTested across architecture variants
5State MachineBefore/after/failure/rollback states traced
6Precondition AuditEvery assumption enforced or documented
7Test Reality CheckTests actually prove correctness (not just exist)
8Disprove-It PassSystematic attempt to find the one bug
9Skepticism HierarchyMost skeptical of indexing, layouts, integer division
10Dependency CheckLicense, CVE, version pinning
11Performance CheckNo regression on hot paths
12Cross-Repo ContractAPI contracts match across repositories

Trace Types (Section 3)

TracePurpose
ProductionDominant real-world case with real numbers
BoundarySmallest/closest-to-threshold valid input
AdversarialHostile input designed to break the code
Integer ArithmeticEvery division/modulo/stride with substituted values
SecurityInjection, SSRF, auth bypass, secrets exposure
ConcurrencyRace conditions, deadlocks, orphaned work

8 Engine Test Tiers

TierNameWhat It Tests
1Server Pytests22 existing server unit tests, CI-gated
2API SnapshotsResponse schema validation against frozen snapshots
3Perf RegressionSpeed, RSS, compaction latency vs baseline
4Quality GateCosine floor, KV survival, state round-trip
5Windows CIMSVC x64 build + full test suite
6Stress TestsKV exhaustion, multi-slot concurrent compaction
7Contract TestsModelAI-specific endpoints, metrics, schema
8Live DashboardAutomated data pipeline to public benchmark page

7 CI Workflows

WorkflowTriggerPurpose
modelai-ciPush, PRBuild + 49 main-label tests
modelai-server-smokePush, PRServer smoke test + pytests
modelai-perf-smokePush, PRPerformance regression detection
modelai-ci-windowsPush, PRWindows MSVC build + test
modelai-upstream-syncSaturday 2PM PDTWeekly upstream merge + build + test
modelai-dashboardAfter CI successAggregate bench data
modelai-auto-labelIssues, PRsAuto-label by path/keyword

Weekly Upstream Sync and Review Process

Every Saturday at 2PM PDT, the automated sync workflow runs. The process is designed so that no upstream change reaches modelai-main without passing build + test + human review.

Automated Pipeline (Saturday 2PM PDT)

Step 1: Fetch + Merge
  1. Fetch latest upstream/master
  2. Update upstream-master tracking branch
  3. Reset upstream-sync to modelai-main
  4. Merge upstream into upstream-sync
Step 2: CI Gate
  1. Full cmake build (Metal, tests, examples)
  2. Run all main-label tests (49 tests)
  3. Benchmark regression check against baseline
  4. On success: merge into modelai-main
  5. On failure: open GitHub Issue with diagnostics

Review Gate

When the upstream delta touches compaction-related files (KV cache, graph construction, attention paths), the merge is held for manual review before integration. The reviewer checks:

  1. API compatibility — Do upstream KV cache API changes break our compacted prefix integration?
  2. Graph construction — Do attention graph changes require updates to our compacted execution path?
  3. Architecture support — Do new model architectures need compaction support added?
  4. Test coverage — Do new upstream tests exercise compacted paths?
  5. Security patches — CVE fixes are fast-tracked for same-day sync

6 upstream KV cache changes have been audited and verified compatible: #10873, #12695, #13194, #17450, #12253, #11213.

3-Branch Model

modelai-main
Working branch. All development happens here. Protected: requires PR review + 2 status checks.
upstream-master
Clean upstream tracking. Force-updated to match upstream/master on every sync.
upstream-sync
Staging branch. Reset to modelai-main, then merged with upstream. CI-gated before integration.

Emergency Hotfix Process

Security patches (CVEs, RPC vulnerabilities) bypass the weekly schedule. Same-day sync via manual workflow_dispatch, followed by expedited review and merge. The RPC RCE patch was synced within hours of upstream disclosure.

Bug Tracking

Total Bugs Fixed
29
All documented with root cause + commit SHA
Critical/Major
18
Every one caught by adversarial review
Security Patches
1
RPC RCE — synced same day

Full bug list: BUGS-AND-FIXES.md