AI Evaluation¶
AI evaluation verifies that AI-assisted features meet quality, safety, reliability, and regression expectations before they are promoted across environments.
Evaluation Targets¶
Evaluate AI features that include:
- chat or assistant responses;
- retrieval augmented generation;
- tool/function calling;
- vector search and ingestion;
- agent workflows;
- summarization and extraction;
- domain-specific recommendations.
ConnectSoft Guidance¶
- Keep evaluation scenarios versioned with tests.
- Use deterministic fixtures where possible.
- Capture prompt, input, retrieved context, output, scoring result, and model/provider metadata.
- Separate local smoke evaluation from CI quality gates.
- Avoid logging secrets, private data, or full sensitive prompts into shared reports.
Template Responsibilities¶
BaseTemplate should document concrete options, test projects, report locations, and registration methods. Layer 3 templates should document domain-specific evaluation datasets, thresholds, and excluded scenarios.