Skip to content

AI Evaluation

AI evaluation verifies that AI-assisted features meet quality, safety, reliability, and regression expectations before they are promoted across environments.

Evaluation Targets

Evaluate AI features that include:

  • chat or assistant responses;
  • retrieval augmented generation;
  • tool/function calling;
  • vector search and ingestion;
  • agent workflows;
  • summarization and extraction;
  • domain-specific recommendations.

ConnectSoft Guidance

  • Keep evaluation scenarios versioned with tests.
  • Use deterministic fixtures where possible.
  • Capture prompt, input, retrieved context, output, scoring result, and model/provider metadata.
  • Separate local smoke evaluation from CI quality gates.
  • Avoid logging secrets, private data, or full sensitive prompts into shared reports.

Template Responsibilities

BaseTemplate should document concrete options, test projects, report locations, and registration methods. Layer 3 templates should document domain-specific evaluation datasets, thresholds, and excluded scenarios.