AI Evaluation¶

AI evaluation verifies that AI-assisted features meet quality, safety, reliability, and regression expectations before they are promoted across environments.

Evaluation Targets¶

Evaluate AI features that include:

chat or assistant responses;
retrieval augmented generation;
tool/function calling;
vector search and ingestion;
agent workflows;
summarization and extraction;
domain-specific recommendations.

ConnectSoft Guidance¶

Keep evaluation scenarios versioned with tests.
Use deterministic fixtures where possible.
Capture prompt, input, retrieved context, output, scoring result, and model/provider metadata.
Separate local smoke evaluation from CI quality gates.
Avoid logging secrets, private data, or full sensitive prompts into shared reports.

Template Responsibilities¶

BaseTemplate should document concrete options, test projects, report locations, and registration methods. Layer 3 templates should document domain-specific evaluation datasets, thresholds, and excluded scenarios.