Evals
Evals are structured assessments of agent performance, output quality, and baseline integrity across the matic system. They run during the mandatory Assessment state in an agent's lifecycle and produce scored records that feed directly into staffing decisions, capability profile updates, and regression gates. This section explains how evals are defined, executed, scored, and used, including the five eval types matic supports: automated, human-scored, outcome-driven, coverage, and regression.
Eval Schema
Eval Schema defines the structure and fields of an eval, including type, scope, trigger conditions, and expected output format.
Eval Runners
Eval Runners explains how evals are executed at runtime, including the automated eval pipeline and its integration with the agent lifecycle.
Eval Targets
Eval Targets describes what evals measure: agent capability during Assessment, work item output at delivery, baseline adherence post-delivery, and charter alignment during onboarding.
Scoring Rubrics
Scoring Rubrics covers how eval results are scored, including coverage dimensions such as floor-adherence and ceiling-adherence, regression pass/fail criteria, and human-scored narrative feedback.
Probe Evals
Probe Evals documents lightweight, side-effect-free evaluation of agent probes using pure data reads and pattern-matching heuristics without requiring GenAI invocation.
Baseline Management
Baseline Management explains how baselines are recorded, versioned, and referenced by work items through baselines_at_risk to protect existing desired state from regressions.
Regression Checks
Regression Checks covers the automated test suites run against touched baselines post-delivery, where a baseline violation is treated as a hard delivery block rather than a retryable error.