Skip to content

Microsoft.Extensions.AI - AI Extensions

Purpose & Overview

Microsoft.Extensions.AI provides a unified, provider-agnostic abstraction layer for integrating AI capabilities into .NET applications. This enables seamless integration with multiple AI providers (OpenAI, Azure OpenAI, Azure AI Inference, Ollama) through a single, consistent interface while supporting chat completions, embeddings, function calling, and tool invocation.

Why AI Extensions?

AI Extensions offer several key benefits:

  • Provider Agnostic: Switch between OpenAI, Azure OpenAI, Azure AI Inference, and Ollama without code changes
  • Unified Interface: Single IChatClient interface for all AI providers
  • Function Calling: Built-in support for AI tool invocation and function calling
  • Observability: Integrated OpenTelemetry and logging support
  • Caching: Distributed caching support for improved performance and cost reduction
  • Type Safety: Strongly-typed abstractions prevent runtime errors
  • Dependency Injection: Seamless integration with ASP.NET Core DI
  • Testing: Easy to mock and test AI interactions

Microsoft.Extensions.AI Philosophy

Microsoft.Extensions.AI provides a unified abstraction layer for AI services, similar to how IDbConnection abstracts database providers. This enables developers to write AI-powered code once and switch providers as needed, while maintaining consistent patterns across the application.

Architecture Overview

AI Extensions Position in Clean Architecture

API Layer (REST/gRPC/GraphQL)
Application Layer (DomainModel)
    ├── Processors (Commands/Writes)
    └── Retrievers (Queries/Reads)
    ↓ (AI Invocation)
Microsoft.Extensions.AI
    ├── IChatClient (Chat Completions)
    ├── IEmbeddingGenerator (Embeddings)
    └── AIFunction (Tool Invocation)
AI Providers
    ├── OpenAI
    ├── Azure OpenAI
    ├── Azure AI Inference
    └── Ollama

Microsoft.Extensions.AI Integration

Core Abstractions

Microsoft.Extensions.AI provides several key abstractions:

IChatClient: Unified interface for chat completions across all providers

public interface IChatClient
{
    Task<ChatResponse> GetResponseAsync(
        IEnumerable<ChatMessage> messages, 
        ChatOptions? options = null);

    IAsyncEnumerable<StreamingChatResponse> GetStreamingResponseAsync(
        IEnumerable<ChatMessage> messages, 
        ChatOptions? options = null);
}

IEmbeddingGenerator: Unified interface for text embeddings

public interface IEmbeddingGenerator<TInput, TOutput>
{
    Task<TOutput> GenerateEmbeddingAsync(
        TInput input, 
        EmbeddingOptions? options = null);
}

AIFunction: Represents a callable function that AI models can invoke

public class AIFunction
{
    public string Name { get; }
    public string Description { get; }
    public Func<object?, CancellationToken, Task<object?>> Function { get; }
}

Service Registration

AI providers are registered using extension methods on IServiceCollection. The exact registration pattern depends on your application's structure, but typically involves:

  1. Creating provider-specific clients (e.g., ChatClient, EmbeddingClient)
  2. Converting them to IChatClient or IEmbeddingGenerator using extension methods (e.g., .AsIChatClient())
  3. Registering them as keyed services using AddKeyedChatClient() or AddKeyedEmbeddingGenerator()
  4. Configuring middleware (OpenTelemetry, Logging, Caching)
  5. Enabling function invocation with UseFunctionInvocation()

For specific implementation examples, refer to your application's registration code or the Microsoft.Extensions.AI documentation.

Supported AI Providers

Microsoft.Extensions.AI supports multiple AI providers through a unified interface:

  • OpenAI: Commercial AI models via OpenAI API
  • Azure OpenAI: OpenAI models hosted on Azure
  • Azure AI Inference: Access to Azure AI Model Catalog
  • Ollama: Open-source, self-hosted models

Each provider implements the same IChatClient and IEmbeddingGenerator interfaces, allowing you to switch providers without changing your application code.

For provider-specific setup and configuration details, refer to the official Microsoft.Extensions.AI documentation.

Using AI Extensions

Chat Completions

Basic Usage:

public class MyService
{
    private readonly IChatClient _chatClient;

    public MyService([FromKeyedServices("openAI")] IChatClient chatClient)
    {
        _chatClient = chatClient;
    }

    public async Task<string> GetChatResponseAsync(string userMessage)
    {
        var messages = new List<ChatMessage>
        {
            new ChatMessage(ChatRole.System, "You are a helpful assistant."),
            new ChatMessage(ChatRole.User, userMessage)
        };

        var response = await _chatClient.GetResponseAsync(messages);
        return response.Text ?? string.Empty;
    }
}

Streaming Responses:

public async IAsyncEnumerable<string> GetStreamingChatResponseAsync(string userMessage)
{
    var messages = new List<ChatMessage>
    {
        new ChatMessage(ChatRole.User, userMessage)
    };

    await foreach (var update in _chatClient.GetStreamingResponseAsync(messages))
    {
        yield return update.Text;
    }
}

Keyed Service Resolution:

// Resolve specific provider
var openAIClient = services.GetRequiredKeyedService<IChatClient>("openAI");
var azureOpenAIClient = services.GetRequiredKeyedService<IChatClient>("azureOpenAI");
var azureAIInferenceClient = services.GetRequiredKeyedService<IChatClient>("azureAIInference");
var ollamaClient = services.GetRequiredKeyedService<IChatClient>("ollama");

Embeddings

Embeddings convert text into numerical vectors that capture semantic meaning, enabling semantic search, similarity matching, and clustering operations.

Text Embeddings:

public class EmbeddingService
{
    private readonly IEmbeddingGenerator<string, Embedding<float>> _embeddingGenerator;

    public EmbeddingService(
        [FromKeyedServices("openAI")] IEmbeddingGenerator<string, Embedding<float>> embeddingGenerator)
    {
        _embeddingGenerator = embeddingGenerator;
    }

    public async Task<Embedding<float>> GenerateEmbeddingAsync(string text)
    {
        return await _embeddingGenerator.GenerateEmbeddingAsync(text);
    }
}

Batch Embedding Generation:

public async Task<List<Embedding<float>>> GenerateEmbeddingsAsync(IEnumerable<string> texts)
{
    var embeddings = new List<Embedding<float>>();

    foreach (var text in texts)
    {
        var embedding = await _embeddingGenerator.GenerateEmbeddingAsync(text);
        embeddings.Add(embedding);
    }

    return embeddings;
}

Semantic Similarity Calculation:

public float CalculateSimilarity(Embedding<float> embedding1, Embedding<float> embedding2)
{
    // Cosine similarity
    var dotProduct = 0f;
    var magnitude1 = 0f;
    var magnitude2 = 0f;

    for (int i = 0; i < embedding1.Vector.Count; i++)
    {
        dotProduct += embedding1.Vector[i] * embedding2.Vector[i];
        magnitude1 += embedding1.Vector[i] * embedding1.Vector[i];
        magnitude2 += embedding2.Vector[i] * embedding2.Vector[i];
    }

    return dotProduct / (MathF.Sqrt(magnitude1) * MathF.Sqrt(magnitude2));
}

Vector Database Integration:

// Example: Store embeddings in vector database for semantic search
public async Task StoreEmbeddingAsync(string documentId, string text)
{
    var embedding = await _embeddingGenerator.GenerateEmbeddingAsync(text);

    // Store in vector database (e.g., Azure AI Search, Qdrant, Pinecone)
    await _vectorStore.UpsertAsync(documentId, embedding.Vector, new DocumentMetadata
    {
        Text = text,
        DocumentId = documentId,
        CreatedAt = DateTime.UtcNow
    });
}

public async Task<List<string>> SearchSimilarDocumentsAsync(string query, int topK = 5)
{
    var queryEmbedding = await _embeddingGenerator.GenerateEmbeddingAsync(query);

    // Search vector database for similar embeddings
    var results = await _vectorStore.SearchAsync(
        queryEmbedding.Vector, 
        topK: topK, 
        threshold: 0.7f);

    return results.Select(r => r.DocumentId).ToList();
}

Use Cases: - Semantic Search: Find similar documents based on meaning rather than keywords - Recommendations: Recommend similar items based on content similarity - Clustering: Group similar items together for analysis - Classification: Classify text based on embedding similarity to known categories - RAG (Retrieval-Augmented Generation): Retrieve relevant context for AI prompts

Vector Store

Vector stores provide persistent storage and retrieval of embeddings, enabling semantic search and RAG (Retrieval-Augmented Generation) capabilities. The template integrates with Microsoft.Extensions.VectorData to provide a unified interface for vector operations.

Official Microsoft Documentation: - Vector Store Overview - Getting started with vector stores in Semantic Kernel - Microsoft.Extensions.VectorData - Vector data abstractions API reference - Vector Search in Azure AI Search - Azure AI Search vector capabilities - Understanding Vector Databases - Vector database concepts and Azure solutions

Configuration:

{
  "MicrosoftExtensionsAI": {
    "VectorStore": {
      "Enabled": true,
      "ProviderType": "InMemory",
      "CollectionName": "default",
      "EmbeddingGeneratorKey": "openAI"
    }
  }
}

Supported Provider Types: - InMemory: In-memory vector store (for development/testing only) - InMemory Vector Store Connector - AzureAISearch: Azure AI Search vector store - Azure AI Search Connector - SqlServer: SQL Server vector store (with vector support) - SQL Server Connector - Prerequisites: - SQL Server 2025 Preview or later (required for VECTOR data type) - Azure SQL Database with vector support enabled - The VECTOR data type is not available in SQL Server 2022 or earlier versions - Docker image: mcr.microsoft.com/mssql/server:2025-latest or mcr.microsoft.com/mssql/server:2025-preview-ubuntu-24.04 - PgVector: PostgreSQL vector store (with pgvector extension) - PostgreSQL Connector - Qdrant: Qdrant vector database - Qdrant Connector

Key Features: - ✅ Unified VectorStore interface for all providers - ✅ Integration with embedding generators via keyed services - ✅ Health checks for vector store availability - ✅ Support for multiple embedding generator providers - ✅ Collection-based organization of vector data - ✅ Vector search with similarity scoring - ✅ Support for metadata filtering and retrieval

Use Cases: - Semantic Search: Find similar documents based on meaning rather than keywords - RAG (Retrieval-Augmented Generation): Retrieve relevant context for AI prompts - Duplicate Detection: Identify duplicate or near-duplicate content - Anomaly Detection: Find content that doesn't match expected patterns - Recommendations: Recommend similar items based on content similarity - Clustering: Group similar items together for analysis

Embedding Models: - OpenAI: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002 - Azure OpenAI: Same models as OpenAI, deployed in Azure - Ollama: Various open-source embedding models (e.g., nomic-embed-text)

Embedding Pipeline and Retrieval Service

The template provides high-level services for embedding generation and semantic search operations: IEmbeddingPipeline and IRetrievalService. These services abstract away the complexity of managing embeddings and provide batch processing, metrics, and observability out of the box.

Embedding Pipeline

The IEmbeddingPipeline service provides batch ingestion and incremental updates for content items:

public interface IEmbeddingPipeline
{
    Task GenerateAsync(IEnumerable<ContentItem> items, string collectionName, CancellationToken cancellationToken = default);
    Task UpsertAsync(ContentItem item, string collectionName, CancellationToken cancellationToken = default);
}

Usage Example:

// Batch ingestion
var items = new List<ContentItem>
{
    new ContentItem { Id = "1", Text = "microservices architecture", Metadata = new Dictionary<string, string> { { "tenantId", "tenant1" } } },
    new ContentItem { Id = "2", Text = "domain-driven design", Metadata = new Dictionary<string, string> { { "tenantId", "tenant1" } } },
};

await embeddingPipeline.GenerateAsync(items, "knowledge-base").ConfigureAwait(false);

// Incremental upsert
var newItem = new ContentItem 
{ 
    Id = "3", 
    Text = "event-driven architecture",
    Metadata = new Dictionary<string, string> { { "tenantId", "tenant1" }, { "tags", "architecture,patterns" } }
};
await embeddingPipeline.UpsertAsync(newItem, "knowledge-base").ConfigureAwait(false);

Features: - Batch Processing: Processes items in configurable batches (default: 64 items) - Automatic Embedding Generation: Generates embeddings using the configured embedding generator - Vector Store Integration: Automatically stores embeddings in the configured vector store - Metadata Support: Supports optional metadata (tenantId, locale, tags) for filtering and personalization - Metrics: Records OpenTelemetry metrics for generation duration, item counts, and failures - Tracing: Emits OpenTelemetry traces for observability

Retrieval Service

The IRetrievalService provides semantic search capabilities:

public interface IRetrievalService
{
    Task<IReadOnlyList<RetrievalResult>> SearchAsync(
        string query,
        int k,
        string collectionName,
        RetrievalOptions? options = null,
        CancellationToken cancellationToken = default);
}

Usage Example:

// Basic search
var results = await retrievalService.SearchAsync("bounded contexts", 5, "knowledge-base").ConfigureAwait(false);

// Search with filtering
var options = new RetrievalOptions 
{ 
    TenantId = "tenant1",
    Tags = new[] { "architecture", "patterns" }
};
var filteredResults = await retrievalService.SearchAsync("microservices", 10, "knowledge-base", options).ConfigureAwait(false);

foreach (var result in filteredResults)
{
    Console.WriteLine($"ID: {result.Id}, Text: {result.Text}, Score: {result.Score}");
    if (result.Metadata != null)
    {
        Console.WriteLine($"Tenant: {result.Metadata.GetValueOrDefault("tenantId")}");
    }
}

Features: - Semantic Search: Finds semantically similar content using vector similarity - Filtering: Supports filtering by tenantId and tags - Ranked Results: Returns results ordered by similarity score (highest first) - Metadata Preservation: Includes original metadata in search results - Metrics: Records search duration, result counts, and failures - Tracing: Emits OpenTelemetry traces for search operations

Metrics

The template provides comprehensive OpenTelemetry metrics for embedding operations through the EmbeddingsMetrics class:

Meter Name: connectsoft.microservicetemplate.ai.embeddings

Counters: - connectsoft.microservicetemplate.ai.embeddings.generate.total - Total embeddings generated - connectsoft.microservicetemplate.ai.embeddings.upsert.total - Total embeddings upserted - connectsoft.microservicetemplate.ai.embeddings.generate.failed - Failed embedding generations - connectsoft.microservicetemplate.ai.embeddings.upsert.failed - Failed embedding upserts - connectsoft.microservicetemplate.ai.embeddings.search.total - Total searches performed - connectsoft.microservicetemplate.ai.embeddings.search.failed - Failed searches

Histograms: - connectsoft.microservicetemplate.ai.embeddings.generate.duration - Embedding generation duration (seconds) - connectsoft.microservicetemplate.ai.embeddings.upsert.duration - Embedding upsert duration (seconds) - connectsoft.microservicetemplate.ai.embeddings.search.duration - Search duration (seconds) - connectsoft.microservicetemplate.ai.embeddings.vector.dimensions - Vector dimensions histogram

Tags: - module: "ConnectSoft.MicroserviceTemplate" - component: "Embeddings" - collection_name: Optional collection name - tenant_id: Optional tenant identifier

Tuning

Batch Size: The default batch size is 64 items. For large-scale ingestion, you can adjust this based on: - Embedding provider rate limits - Available memory - Network latency

Vector Dimensions: The default vector dimension is 1536 (matches OpenAI text-embedding-3-small and text-embedding-ada-002). Ensure your embedding model dimensions match: - OpenAI: 1536 (text-embedding-3-small, text-embedding-ada-002) - Azure OpenAI: Varies by deployment (typically 1536) - Ollama: Varies by model (check model documentation)

Collection Naming: Use descriptive collection names to organize different content types: - knowledge-base - General knowledge articles - faq - Frequently asked questions - documentation - Technical documentation - user-content - User-generated content

Personalization

The embedding pipeline and retrieval service support personalization through metadata:

Tenant Isolation:

var item = new ContentItem 
{ 
    Id = "1", 
    Text = "tenant-specific content",
    Metadata = new Dictionary<string, string> { { "tenantId", "tenant1" } }
};

// Search only within tenant's content
var options = new RetrievalOptions { TenantId = "tenant1" };
var results = await retrievalService.SearchAsync("query", 10, "collection", options).ConfigureAwait(false);

Tag-Based Filtering:

var item = new ContentItem 
{ 
    Id = "1", 
    Text = "architecture patterns",
    Metadata = new Dictionary<string, string> { { "tags", "architecture,patterns,ddd" } }
};

// Search with tag filter
var options = new RetrievalOptions { Tags = new[] { "architecture", "ddd" } };
var results = await retrievalService.SearchAsync("query", 10, "collection", options).ConfigureAwait(false);

Locale Support:

var item = new ContentItem 
{ 
    Id = "1", 
    Text = "English content",
    Metadata = new Dictionary<string, string> { { "locale", "en-US" } }
};

Troubleshooting Embedding Pipeline and Retrieval

Mismatched Dimensions: - Error: "Vector dimensions do not match" - Solution: Ensure your embedding model produces vectors with the expected dimensions. Check the [VectorStoreVector] attribute dimension parameter matches your model.

Empty Search Results: - Cause: No similar content in the vector store, or filters are too restrictive - Solution: - Verify items were successfully ingested - Check collection name matches - Relax filter criteria - Verify embedding model is appropriate for your content type

High Latency: - Cause: Large batch sizes, network issues, or embedding provider rate limits - Solution: - Reduce batch size - Implement retry logic with exponential backoff - Use caching for frequently accessed embeddings - Consider using a faster embedding model

Authentication Errors: - Error: "401 Unauthorized" or "403 Forbidden" - Solution: - Verify API keys are correct and not expired - Check endpoint URLs are correct - Ensure service principal has necessary permissions (for Azure services)

Collection Not Found: - Error: "Collection does not exist" - Solution: The pipeline automatically creates collections, but verify: - Vector store is properly configured - Collection name is valid - Vector store provider supports collection creation

AI Function Invocation (Tool Calling)

AI function invocation allows AI models to call external functions (tools) during conversation, enabling them to perform actions, retrieve data, or interact with your application.

Understanding AIFunction

AIFunction represents a callable function that AI models can invoke. It contains:

  • Name: The function name exposed to the AI model
  • Description: Human-readable description of what the function does
  • Parameters: List of function parameters with their types and descriptions
  • Function: The actual delegate that executes when the AI calls the function

Creating AI Functions

From Static Methods:

The most common pattern is creating functions from static methods using AIFunctionFactory.Create():

public static class MathTools
{
    [Description("Calculate the square of a number.")]
    [return: Description("The squared result.")]
    public static int Square(
        [Description("The number to square.")] int number)
    {
        return number * number;
    }

    public static AIFunction CreateSquareFunction()
    {
        var method = typeof(MathTools).GetMethod(nameof(Square))!;
        return AIFunctionFactory.Create(
            method: method,
            target: null, // null for static methods
            options: new AIFunctionFactoryOptions
            {
                Name = "square",
                Description = null, // Uses [Description] attribute from method
            });
    }
}

From Instance Methods:

You can also create functions from instance methods by providing the target object:

public class CalculatorService
{
    [Description("Add two numbers together.")]
    [return: Description("The sum of the two numbers.")]
    public int Add(
        [Description("First number.")] int a,
        [Description("Second number.")] int b)
    {
        return a + b;
    }

    public AIFunction CreateAddFunction()
    {
        var method = typeof(CalculatorService).GetMethod(nameof(Add))!;
        return AIFunctionFactory.Create(
            method: method,
            target: this, // Provide instance for instance methods
            options: new AIFunctionFactoryOptions
            {
                Name = "add",
                Description = null,
            });
    }
}

Using Description Attributes:

The [Description] attribute is used to provide metadata for the function and its parameters:

  • Method-level [Description]: Describes what the function does
  • Parameter [Description]: Describes each parameter
  • Return [Description]: Describes the return value

These descriptions are automatically extracted and included in the function schema sent to the AI model, helping it understand when and how to use the function.

Using Dependency Injection in Functions

Functions can access services from the dependency injection container through AIFunctionArguments:

public class WeatherService
{
    private readonly IHttpClientFactory _httpClientFactory;

    public WeatherService(IHttpClientFactory httpClientFactory)
    {
        _httpClientFactory = httpClientFactory;
    }

    [Description("Get current weather for a location.")]
    [return: Description("The current weather information.")]
    public async Task<string> GetWeatherAsync(
        AIFunctionArguments args,
        [Description("The city name.")] string city)
    {
        // Access services from DI container
        var httpClient = args.ServiceProvider.GetRequiredService<IHttpClientFactory>();

        // Use the service to fetch weather data
        var client = httpClient.CreateClient();
        var response = await client.GetStringAsync($"https://api.weather.com/{city}");
        return response;
    }

    public AIFunction CreateGetWeatherFunction()
    {
        var method = typeof(WeatherService).GetMethod(nameof(GetWeatherAsync))!;
        return AIFunctionFactory.Create(
            method: method,
            target: this,
            options: new AIFunctionFactoryOptions
            {
                Name = "get_weather",
            });
    }
}

Note: The first parameter of type AIFunctionArguments is automatically injected and provides access to IServiceProvider.

Parameter Types

AI functions support various parameter types:

  • Primitive types: int, string, bool, double, float, etc.
  • Nullable types: int?, string?, etc.
  • Arrays and collections: string[], List<int>, etc.
  • Complex objects: Custom classes (serialized as JSON)
  • Enums: Enum types are supported

Example with Complex Types:

public class UserInfo
{
    public string Name { get; set; }
    public int Age { get; set; }
}

[Description("Create a user profile.")]
[return: Description("The created user ID.")]
public string CreateUser(
    [Description("User information.")] UserInfo userInfo)
{
    // Create user logic
    return Guid.NewGuid().ToString();
}

Using Functions in Chat Requests

Functions are passed to the AI model via ChatOptions.Tools:

public async Task<string> UseAIToolAsync(IChatClient chatClient)
{
    // Create the function
    var squareFunction = MathTools.CreateSquareFunction();

    // Add to tools list
    var tools = new List<AITool> { squareFunction };

    var options = new ChatOptions
    {
        Tools = tools,
        ToolMode = ChatToolMode.RequireAny, // or ChatToolMode.Auto
    };

    var messages = new List<ChatMessage>
    {
        new ChatMessage(ChatRole.System, "You can use tools to perform calculations."),
        new ChatMessage(ChatRole.User, "What is 5 squared?")
    };

    var response = await chatClient.GetResponseAsync(messages, options);
    return response.Text ?? string.Empty;
}

Tool Mode Options

ChatToolMode controls how the AI model uses tools:

  • Auto: The model decides whether to use tools based on the conversation
  • RequireAny: The model must use at least one tool
  • None: Tools are not used (default)

Function Invocation Flow

  1. Define Function: Create a method with appropriate [Description] attributes
  2. Create AIFunction: Use AIFunctionFactory.Create() to create an AIFunction instance
  3. Add to ChatOptions: Include the function in ChatOptions.Tools
  4. AI Decision: The AI model analyzes the conversation and decides if/when to call the function
  5. Function Execution: When called, the function executes with the provided parameters
  6. Result Return: The function result is returned to the AI model
  7. Final Response: The AI model incorporates the function result into its response

Enabling Function Invocation

Function invocation must be enabled when registering the chat client:

var chatClientBuilder = services.AddKeyedChatClient("openAI", chatClient);
chatClientBuilder.UseFunctionInvocation(); // Enable function calling

Best Practices

  1. Clear Descriptions: Provide clear, concise descriptions for functions and parameters
  2. Error Handling: Implement proper error handling in function implementations
  3. Validation: Validate function parameters before execution
  4. Idempotency: Design functions to be idempotent when possible
  5. Security: Validate and sanitize inputs, especially when functions interact with external systems
  6. Performance: Keep function execution fast; consider async operations for I/O-bound tasks
  7. Logging: Log function invocations for debugging and monitoring

Common Patterns

Multiple Functions:

var tools = new List<AITool>
{
    MathTools.CreateSquareFunction(),
    MathTools.CreateAddFunction(),
    WeatherService.CreateGetWeatherFunction()
};

var options = new ChatOptions
{
    Tools = tools,
    ToolMode = ChatToolMode.Auto,
};

Conditional Function Usage:

// Only include weather function if user asks about weather
var tools = new List<AITool>();
if (userMessage.Contains("weather", StringComparison.OrdinalIgnoreCase))
{
    tools.Add(WeatherService.CreateGetWeatherFunction());
}

var options = new ChatOptions
{
    Tools = tools,
    ToolMode = tools.Count > 0 ? ChatToolMode.Auto : ChatToolMode.None,
};

Troubleshooting Function Invocation

Function Not Called: - Verify UseFunctionInvocation() is called during registration - Check that ToolMode is set appropriately - Ensure function descriptions are clear and relevant - Verify the function is included in ChatOptions.Tools

Function Execution Errors: - Check function parameter types match expected schema - Verify function implementation handles edge cases - Ensure services are registered in DI container (for functions using DI) - Review function logs for detailed error information

Parameter Parsing Issues: - Ensure parameter types are supported (primitives, simple objects) - Check that [Description] attributes are properly applied - Verify parameter names match between function signature and schema

Configuration

Configuration for Microsoft.Extensions.AI providers is typically done through appsettings.json or environment variables. The exact configuration structure depends on the provider and your application's setup.

For provider-specific configuration details, refer to: - Microsoft.Extensions.AI Documentation - Provider-specific documentation (OpenAI, Azure OpenAI, etc.)

Common configuration patterns include: - API keys and endpoints - Model selection - Service identifiers for keyed service registration - Optional settings like organization IDs, deployment names, etc.

Middleware Integration

All AI providers support integrated middleware for observability, logging, and caching. For comprehensive documentation on AI observability, see AI Observability.

OpenTelemetry

Automatic Instrumentation:

#if OpenTelemetry
    openAIChatClientBuilder.UseOpenTelemetry();
    openAIEmbeddingGeneratorClientBuilder.UseOpenTelemetry();
#endif

Meters:

// OpenTelemetryExtensions.cs
#if UseMicrosoftExtensionsAI
    .AddMeter("Experimental.Microsoft.Extensions.AI*")
#endif

Traces:

#if UseMicrosoftExtensionsAI
    .AddSource("Experimental.Microsoft.Extensions.AI*")
#endif

What's Instrumented: - Chat completion requests and responses - Embedding generation requests - Tool invocations - Token usage and costs - Latency and errors

For detailed information on AI observability with OpenTelemetry, see AI Observability - OpenTelemetry Integration.

Logging

Structured Logging:

openAIChatClientBuilder.UseLogging();
openAIEmbeddingGeneratorClientBuilder.UseLogging();

Log Categories: - Microsoft.Extensions.AI.Chat - Microsoft.Extensions.AI.Embeddings - Microsoft.Extensions.AI.Tools

Logged Information: - Request/response details - Token usage - Model information - Error details

For detailed information on AI logging, see AI Observability - Structured Logging.

Distributed Caching

Caching Integration:

#if (DistributedCacheInMemory || DistributedCacheRedis)
    openAIChatClientBuilder.UseDistributedCache();
    openAIEmbeddingGeneratorClientBuilder.UseDistributedCache();
#endif

Benefits: - Cost Reduction: Cache responses to avoid duplicate API calls - Performance: Faster responses for cached requests - Rate Limiting: Reduce API rate limit issues

Cache Keys: - Chat completions: Based on messages and options - Embeddings: Based on input text and options

For detailed information on cache observability, see AI Observability - Distributed Caching.

AI Evaluation

Microsoft.Extensions.AI includes evaluation capabilities through Microsoft.Extensions.AI.Evaluation packages, enabling quality assessment, NLP evaluation, and reporting for AI responses.

Reference Documentation: - Microsoft.Extensions.AI.Evaluation libraries - dotnet/ai-samples evaluation API examples

Evaluation Overview

AI evaluation helps assess: - Response Quality: Accuracy, relevance, and coherence of AI responses - NLP Metrics: Language quality, sentiment, and linguistic characteristics - Performance: Latency, token usage, and cost metrics - Compliance: Adherence to safety guidelines and content policies

Evaluation Packages

Available packages: - Microsoft.Extensions.AI.Evaluation: Core evaluation abstractions - Microsoft.Extensions.AI.Evaluation.Quality: Quality metrics and evaluators - Microsoft.Extensions.AI.Evaluation.NLP: Natural language processing evaluators - Microsoft.Extensions.AI.Evaluation.Reporting: Evaluation reporting and aggregation

Configuration

Evaluation options are embedded within MicrosoftExtensionsAIOptions under the AIEvaluation property. Configuration is managed through the MicrosoftExtensionsAI section in appsettings.json.

Configuration Example:

{
  "MicrosoftExtensionsAI": {
    "AIEvaluation": {
      "Enabled": false,  // Master gate flag - set to true to enable evaluation
      "EnableNlp": true,  // Enable NLP evaluation (BLEU, GLEU, F1)
      "EnableQuality": true,  // Enable quality evaluation (relevance, groundedness, etc.)
      "EnableSafety": false,  // Enable safety evaluation (requires Azure AI Foundry)
      "MinRelevance": 0.75,  // Minimum relevance score threshold (0-1)
      "MinGroundedness": 0.70,  // Minimum groundedness score threshold (0-1)
      "SampleRatio": 0.25,  // Sample ratio for evaluation (0-1)
      "MaxCostUsdPerRun": 0.50,  // Maximum cost in USD per evaluation run (0-1000)
      "SuiteName": "default"  // Evaluation suite name
    }
  }
}

Environment Toggle:

To enable evaluation in a specific environment, set AIEvaluation.Enabled to true in the environment-specific appsettings.{Environment}.json file:

{
  "MicrosoftExtensionsAI": {
    "AIEvaluation": {
      "Enabled": true  // Enable evaluation for this environment
    }
  }
}

Template Configuration:

Evaluation is controlled by the UseMicrosoftExtensionsAIEvaluation template parameter. When set to false, evaluation-related files are excluded from the generated template.

Evaluator Registration

When evaluation is enabled, evaluators are automatically registered in the dependency injection container as IEvaluator singletons. The registration happens in MicrosoftExtensionsAIExtensions.SetupAIEvaluationIntegration():

NLP Evaluators (no LLM required): - BLEUEvaluator - Bilingual evaluation understudy algorithm for text similarity - GLEUEvaluator - Google BLEU algorithm optimized for sentence-level evaluation - F1Evaluator - F1 scoring algorithm (ratio of shared words between generated and reference)

Quality Evaluators (require ChatConfiguration): - RelevanceEvaluator - Measures how relevant a response is to a query - CoherenceEvaluator - Measures logical and orderly presentation of ideas - GroundednessEvaluator - Measures how well a generated response aligns with given context - CompletenessEvaluator - Measures comprehensiveness and accuracy - FluencyEvaluator - Measures grammatical accuracy, vocabulary range, and readability

ChatConfiguration:

Quality evaluators require a ChatConfiguration instance, which is automatically created from an available IChatClient. The template tries to resolve a chat client in this order of preference: 1. Azure OpenAI ("azureOpenAI") 2. OpenAI ("openAI") 3. Azure AI Inference ("azureAIInference") 4. Ollama ("ollama")

The ChatConfiguration is registered as a singleton in the DI container and is automatically injected when needed.

Quality Evaluation

Response Quality Assessment:

using Microsoft.Extensions.AI;
using Microsoft.Extensions.AI.Evaluation;
using Microsoft.Extensions.DependencyInjection;

public class AIQualityEvaluator
{
    private readonly IServiceProvider _serviceProvider;

    public AIQualityEvaluator(IServiceProvider serviceProvider)
    {
        _serviceProvider = serviceProvider;
    }

    public async Task<EvaluationResult> EvaluateResponseAsync(
        string query, 
        string generated)
    {
        // Get all registered evaluators (includes both NLP and Quality evaluators)
        var evaluators = _serviceProvider.GetServices<IEvaluator>().ToList();
        var chatConfiguration = _serviceProvider.GetRequiredService<ChatConfiguration>();

        // Evaluate with all registered evaluators
        var evaluationTasks = evaluators.Select(evaluator =>
            evaluator.EvaluateAsync(query, generated, chatConfiguration)).ToList();
        var results = await Task.WhenAll(evaluationTasks.Select(t => t.AsTask()));

        // Combine all metrics from all evaluators
        var allMetrics = results.SelectMany(r => r.Metrics)
            .ToDictionary(kvp => kvp.Key, kvp => kvp.Value, StringComparer.OrdinalIgnoreCase);

        return new EvaluationResult(allMetrics);
    }
}

Quality Metrics (from Microsoft.Extensions.AI.Evaluation.Quality): - Relevance: How relevant a response is to a query - Groundedness: How well a generated response aligns with the given context - Completeness: How comprehensive and accurate a response is - Coherence: The logical and orderly presentation of ideas - Fluency: Grammatical accuracy, vocabulary range, sentence complexity, and overall readability - Equivalence: The similarity between the generated text and its ground truth with respect to a query

Reference: Quality Evaluators

NLP Evaluation

Natural Language Processing Metrics (no LLM required):

NLP evaluators use traditional NLP techniques such as text tokenization and n-gram analysis. They do not require an LLM and can be used independently of quality evaluators.

using Microsoft.Extensions.AI;
using Microsoft.Extensions.AI.Evaluation;
using Microsoft.Extensions.DependencyInjection;

public class NLPEvaluator
{
    private readonly IServiceProvider _serviceProvider;

    public NLPEvaluator(IServiceProvider serviceProvider)
    {
        _serviceProvider = serviceProvider;
    }

    public async Task<EvaluationResult> EvaluateNLPAsync(
        string generated, 
        string reference)
    {
        // Get only NLP evaluators (filter by type if needed, or get all and filter results)
        var evaluators = _serviceProvider.GetServices<IEvaluator>().ToList();
        var chatConfiguration = _serviceProvider.GetRequiredService<ChatConfiguration>();

        // NLP evaluators don't require ChatConfiguration, but it's passed for consistency
        // They will ignore it since they don't use an LLM
        var evaluationTasks = evaluators.Select(evaluator =>
            evaluator.EvaluateAsync(string.Empty, generated, chatConfiguration)).ToList();
        var results = await Task.WhenAll(evaluationTasks.Select(t => t.AsTask()));

        // Combine metrics (NLP evaluators will produce BLEU, GLEU, F1 metrics)
        var allMetrics = results.SelectMany(r => r.Metrics)
            .ToDictionary(kvp => kvp.Key, kvp => kvp.Value, StringComparer.OrdinalIgnoreCase);

        return new EvaluationResult(allMetrics);
    }
}

Note: NLP evaluators compare generated text against reference text. The query parameter is not used by NLP evaluators, but it's included in the signature for consistency with the IEvaluator interface.

NLP Metrics (from Microsoft.Extensions.AI.Evaluation.NLP): - BLEU: Bilingual evaluation understudy algorithm for text similarity - GLEU: Google BLEU algorithm optimized for sentence-level evaluation - F1: F1 scoring algorithm (ratio of shared words between generated and reference)

These evaluators use traditional NLP techniques such as text tokenization and n-gram analysis - they do not require an LLM.

Reference: NLP Evaluators

Evaluation Reporting

Reporting Support:

The template includes reporting support via Microsoft.Extensions.AI.Evaluation.Reporting, which enables: - Response Caching: Responses from AI models are persisted in a cache for faster execution and lower cost - Result Storage: Evaluation results are stored for analysis and trending - Report Generation: Generate reports from stored evaluation data

Using the dotnet-aieval Tool:

The dotnet aieval tool (from Microsoft.Extensions.AI.Evaluation.Console package) can be used to generate reports:

# Install the tool globally
dotnet tool install --global dotnet-aieval

# Generate a report
dotnet aieval report --out ./ai-eval-report

CI/CD Integration:

Reports can be generated in Azure DevOps pipelines and published as artifacts. See the azure-pipelines.yml configuration for evaluation report generation steps.

Reference: Evaluation Reporting

Using Evaluation in Testing (MSTest)

BDD Evaluation Tests with Reqnroll:

The template includes acceptance tests for AI evaluation using Reqnroll (SpecFlow-style BDD) and MSTest. The tests are located in:

  • Feature File: ConnectSoft.MicroserviceTemplate.AcceptanceTests/AIExtensionsFeatures/AI Evaluation Feature.feature
  • Step Definitions: ConnectSoft.MicroserviceTemplate.AcceptanceTests/AIExtensionsFeatures/Steps/AIEvaluationFeatureStepDefinitions.cs

Implementation Example:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.AI.Evaluation;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using Reqnroll;

[Binding]
public sealed class AIEvaluationFeatureStepDefinitions
{
    private IChatClient? chatClient;
    private string assistantResponse = string.Empty;
    private EvaluationResult? evaluationResult;
    private string? userQuery;

    [When("I evaluate the response for quality metrics")]
    public async Task WhenIEvaluateTheResponseForQualityMetrics()
    {
        var services = BeforeAfterTestRunHooks.ServerInstance?.Services;
        Assert.IsNotNull(services, "Services are not available.");

        var evaluators = services.GetServices<IEvaluator>().ToList();
        Assert.IsNotEmpty(evaluators, "No IEvaluator instances are registered.");

        var chatConfiguration = services.GetRequiredService<ChatConfiguration>();
        Assert.IsNotNull(chatConfiguration, "ChatConfiguration is not registered.");

        Assert.IsNotNull(this.userQuery, "User query should be set.");
        Assert.IsNotNull(this.assistantResponse, "Assistant response should be set.");

        // Use SDK's built-in EvaluatorExtensions.EvaluateAsync to evaluate with all registered evaluators
        var evaluationTasks = evaluators.Select(evaluator =>
            evaluator.EvaluateAsync(this.userQuery, this.assistantResponse, chatConfiguration)).ToList();

        var results = await Task.WhenAll(evaluationTasks.Select(t => t.AsTask())).ConfigureAwait(false);

        // Combine all metrics from all evaluators into a single result
        var allMetrics = results.SelectMany(r => r.Metrics)
            .ToDictionary(kvp => kvp.Key, kvp => kvp.Value, StringComparer.OrdinalIgnoreCase);
        this.evaluationResult = new EvaluationResult(allMetrics);

        Assert.IsNotNull(this.evaluationResult, "Evaluation result should not be null.");
    }

    [Then("the AI evaluation score for {string} is at least {double}")]
    public void ThenTheAIEvaluationScoreForIsAtLeast(string metricName, double minScore)
    {
        Assert.IsNotNull(this.evaluationResult, "Evaluation result should not be null.");
        Assert.IsNotNull(this.evaluationResult.Metrics, "Evaluation metrics should not be null.");

        // Use SDK's built-in TryGet method to retrieve the numeric metric by name
        var metricFound = this.evaluationResult.TryGet<NumericMetric>(metricName, out var metric);
        Assert.IsTrue(metricFound, $"Metric '{metricName}' not found in evaluation results. Available metrics: {string.Join(", ", this.evaluationResult.Metrics.Keys)}");

        Assert.IsNotNull(metric, $"Metric '{metricName}' should not be null.");
        Assert.IsTrue(metric.Value.HasValue, $"Metric '{metricName}' should have a value.");

        var score = metric.Value!.Value; // Extract non-nullable double value
        Assert.IsGreaterThanOrEqualTo(
            minScore,
            score,
            $"Expected {metricName} score to be at least {minScore}, but got {score}. Response: '{this.assistantResponse}'");
    }
}

Test Scenarios:

The feature file includes scenarios for: - OpenAI provider evaluation (relevance and groundedness thresholds) - Azure OpenAI provider evaluation (relevance and groundedness thresholds)

Tests verify that AI responses meet minimum quality thresholds (0.50 for basic scenarios).

Unit Tests:

Unit tests for evaluation options validation are included in AIEvaluationOptionsTests.cs:

[TestMethod]
public void AIEvaluationOptionsShouldHaveCorrectDefaultValues()
{
    var options = new MicrosoftExtensionsAIOptions.AIEvaluationOptions();

    Assert.IsFalse(options.Enabled);
    Assert.IsTrue(options.EnableNlp);
    Assert.IsTrue(options.EnableQuality);
    Assert.AreEqual(0.75, options.MinRelevance);
}

Continuous Evaluation:

using Microsoft.Extensions.AI;
using Microsoft.Extensions.AI.Evaluation;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;

// Evaluate AI responses in production
public class ProductionEvaluator
{
    private readonly IServiceProvider _serviceProvider;
    private readonly ILogger<ProductionEvaluator> _logger;

    public ProductionEvaluator(
        IServiceProvider serviceProvider,
        ILogger<ProductionEvaluator> logger)
    {
        _serviceProvider = serviceProvider;
        _logger = logger;
    }

    public async Task EvaluateAndLogAsync(string query, string response)
    {
        var evaluators = _serviceProvider.GetServices<IEvaluator>().ToList();
        var chatConfiguration = _serviceProvider.GetRequiredService<ChatConfiguration>();

        // Evaluate with all registered evaluators
        var evaluationTasks = evaluators.Select(evaluator =>
            evaluator.EvaluateAsync(query, response, chatConfiguration)).ToList();
        var results = await Task.WhenAll(evaluationTasks.Select(t => t.AsTask()));

        // Combine all metrics
        var allMetrics = results.SelectMany(r => r.Metrics)
            .ToDictionary(kvp => kvp.Key, kvp => kvp.Value, StringComparer.OrdinalIgnoreCase);
        var evaluationResult = new EvaluationResult(allMetrics);

        _logger.LogInformation(
            "AI response evaluated. Metrics: {@Metrics}",
            evaluationResult.Metrics);

        // Alert if quality drops below threshold
        if (evaluationResult.TryGet<NumericMetric>("relevance", out var relevanceMetric) 
            && relevanceMetric.Value.HasValue 
            && relevanceMetric.Value.Value < 0.7)
        {
            _logger.LogWarning(
                "AI response quality below threshold. Relevance: {Relevance}",
                relevanceMetric.Value.Value);
        }
    }
}

Testing

BDD Testing with Reqnroll

Chat Completions Test:

// AIChatCompletionsFeatureStepDefinitions.cs
[Given("AI provider configured to use model chat completions {string}")]
public void GivenAIProviderConfiguredToUseModelChatCompletions(string aiProvider)
{
    this.chatClient = BeforeAfterTestRunHooks.ServerInstance?
        .Services.GetRequiredKeyedService<IChatClient>(aiProvider);

    Assert.IsNotNull(this.chatClient, "IChatClient is not resolved.");
}

[When("I send a chat request with:")]
public async Task WhenISendAChatRequestWith(DataTable table)
{
    var chatHistory = new List<ChatMessage>();
    foreach (var row in table.Rows)
    {
        var role = row["role"].Trim().ToLowerInvariant();
        var content = row["content"];
        chatHistory.Add(role switch
        {
            "system" => new ChatMessage(ChatRole.System, content),
            "user" => new ChatMessage(ChatRole.User, content),
            "assistant" => new ChatMessage(ChatRole.Assistant, content),
            _ => new ChatMessage(ChatRole.User, content)
        });
    }

    await foreach (var update in this.chatClient!.GetStreamingResponseAsync(chatHistory))
    {
        this.assistantResponse += update.Text;
    }
}

Tool Invocation Test:

// Example BDD test for tool invocation
[When("I ask AI to calculate the square of {int}")]
public async Task WhenIAskAIToCalculateTheSquareOf(int number)
{
    var tool = MathTools.CreateSquareFunction();
    var tools = new List<AITool> { tool };
    var options = new ChatOptions
    {
        Tools = tools,
        ToolMode = ChatToolMode.RequireAny,
    };

    var system = new ChatMessage(
        ChatRole.System,
        "You can use tools to perform calculations.");

    var user = new ChatMessage(
        ChatRole.User,
        $"What is {number} squared?");

    var response = await this.chatClient!
        .GetResponseAsync([system, user], options);

    this.assistantResponse = response.Text ?? string.Empty;
}

Unit Testing

Mocking IChatClient:

var mockChatClient = new Mock<IChatClient>();
mockChatClient
    .Setup(x => x.GetResponseAsync(It.IsAny<IEnumerable<ChatMessage>>(), It.IsAny<ChatOptions>()))
    .ReturnsAsync(new ChatResponse
    {
        Text = "Mocked response",
        FinishReason = ChatFinishReason.Stop
    });

var service = new MyService(mockChatClient.Object);
var result = await service.GetChatResponseAsync("test");

Best Practices

Do's

  1. Use Keyed Services for Multiple Providers

    // ✅ GOOD - Explicit provider selection
    public MyService([FromKeyedServices("openAI")] IChatClient chatClient)
    {
        _chatClient = chatClient;
    }
    

  2. Use Streaming for Long Responses

    // ✅ GOOD - Streaming for better UX
    await foreach (var update in _chatClient.GetStreamingResponseAsync(messages))
    {
        await response.WriteAsync(update.Text);
    }
    

  3. Cache Expensive Operations

    // ✅ GOOD - Cache embeddings
    var cacheKey = $"embedding:{text}";
    var cached = await _cache.GetAsync<Embedding<float>>(cacheKey);
    if (cached != null) return cached;
    
    var embedding = await _embeddingGenerator.GenerateEmbeddingAsync(text);
    await _cache.SetAsync(cacheKey, embedding);
    

  4. Use System Messages for Context

    // ✅ GOOD - Provide context via system message
    var messages = new List<ChatMessage>
    {
        new ChatMessage(ChatRole.System, "You are a helpful assistant for a microservice platform."),
        new ChatMessage(ChatRole.User, userMessage)
    };
    

  5. Handle Errors Gracefully

    // ✅ GOOD - Error handling
    try
    {
        var response = await _chatClient.GetResponseAsync(messages);
        return response.Text ?? string.Empty;
    }
    catch (Exception ex)
    {
        _logger.LogError(ex, "AI chat completion failed");
        return "I'm sorry, I encountered an error. Please try again.";
    }
    

Don'ts

  1. Don't Expose API Keys

    // ❌ BAD - Hardcoded API key
    var apiKey = "sk-proj-...";
    
    // ✅ GOOD - Use configuration
    var apiKey = configuration["MicrosoftExtensionsAI:OpenAI:ApiKey"];
    

  2. Don't Ignore Rate Limits

    // ❌ BAD - No rate limiting
    foreach (var item in items)
    {
        await _chatClient.GetResponseAsync(...); // May hit rate limits
    }
    
    // ✅ GOOD - Rate limiting
    var semaphore = new SemaphoreSlim(10); // Max 10 concurrent
    await semaphore.WaitAsync();
    try
    {
        await _chatClient.GetResponseAsync(...);
    }
    finally
    {
        semaphore.Release();
    }
    

  3. Don't Cache Everything

    // ❌ BAD - Cache user-specific responses
    var cacheKey = $"response:{userId}"; // User-specific, shouldn't cache
    
    // ✅ GOOD - Cache general knowledge
    var cacheKey = $"embedding:{text}"; // General knowledge, can cache
    

  4. Don't Block on AI Calls

    // ❌ BAD - Blocking call
    var response = _chatClient.GetResponseAsync(messages).Result;
    
    // ✅ GOOD - Async/await
    var response = await _chatClient.GetResponseAsync(messages);
    

  5. Use Streaming for Long Responses

    // ✅ GOOD - Streaming improves UX
    await foreach (var update in _chatClient.GetStreamingResponseAsync(messages))
    {
        await response.WriteAsync(update.Text);
    }
    
    // ❌ BAD - Blocking wait for complete response
    var response = await _chatClient.GetResponseAsync(messages);
    await response.WriteAsync(response.Text);
    

  6. Evaluate AI Responses

    // ✅ GOOD - Evaluate response quality
    var aiResponse = await _chatClient.GetResponseAsync(messages);
    
    // Get evaluators and ChatConfiguration from DI
    var evaluators = _serviceProvider.GetServices<IEvaluator>().ToList();
    var chatConfiguration = _serviceProvider.GetRequiredService<ChatConfiguration>();
    
    // Evaluate with all registered evaluators
    var evaluationTasks = evaluators.Select(evaluator =>
        evaluator.EvaluateAsync(userMessage, aiResponse.Text ?? string.Empty, chatConfiguration)).ToList();
    var results = await Task.WhenAll(evaluationTasks.Select(t => t.AsTask()));
    
    // Combine metrics
    var allMetrics = results.SelectMany(r => r.Metrics)
        .ToDictionary(kvp => kvp.Key, kvp => kvp.Value, StringComparer.OrdinalIgnoreCase);
    var evaluationResult = new EvaluationResult(allMetrics);
    
    // Check relevance score
    if (evaluationResult.TryGet<NumericMetric>("relevance", out var relevanceMetric) 
        && relevanceMetric.Value.HasValue 
        && relevanceMetric.Value.Value < 0.7)
    {
        _logger.LogWarning("Low quality AI response detected. Relevance: {Relevance}", relevanceMetric.Value.Value);
    }
    

  7. Handle Timeouts and Cancellation

    // ✅ GOOD - Timeout and cancellation support
    using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(30));
    try
    {
        var response = await _chatClient.GetResponseAsync(messages, cancellationToken: cts.Token);
    }
    catch (OperationCanceledException)
    {
        _logger.LogWarning("AI request timed out");
    }
    

  8. Monitor Token Usage

    // ✅ GOOD - Track token usage for cost management
    var response = await _chatClient.GetResponseAsync(messages);
    var tokenUsage = response.Usage;
    _logger.LogInformation(
        "Token usage - Prompt: {PromptTokens}, Completion: {CompletionTokens}, Total: {TotalTokens}",
        tokenUsage.PromptTokenCount,
        tokenUsage.CompletionTokenCount,
        tokenUsage.TotalTokenCount);
    

  9. Use Appropriate Models

    // ✅ GOOD - Use smaller models for simple tasks
    var simpleModel = await _chatClient.GetResponseAsync(messages); // Uses configured model
    
    // ✅ GOOD - Use larger models for complex tasks
    var complexModel = await _chatClient.GetResponseAsync(
        messages, 
        new ChatOptions { Model = "gpt-4o" });
    

  10. Implement Retry Logic

    // ✅ GOOD - Retry with exponential backoff
    var retryPolicy = Policy
        .Handle<HttpRequestException>()
        .Or<RateLimitExceededException>()
        .WaitAndRetryAsync(
            retryCount: 3,
            sleepDurationProvider: retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)),
            onRetry: (outcome, timespan, retryCount, context) =>
            {
                _logger.LogWarning($"Retry {retryCount} after {timespan}");
            });
    
    var response = await retryPolicy.ExecuteAsync(async () =>
        await _chatClient.GetResponseAsync(messages));
    

Integration with Other Patterns

AI Extensions + Domain Model

Use Case: AI-powered domain operations

public class AIEnhancedProcessor
{
    private readonly IChatClient _chatClient;
    private readonly IRepository _repository;

    public async Task<DomainEntity> CreateAsync(CreateInput input)
    {
        // Use AI to enhance data
        var enhanced = await _chatClient.GetResponseAsync([
            new ChatMessage(ChatRole.System, "Enhance user input with AI insights."),
            new ChatMessage(ChatRole.User, input.SomeValue)
        ]);

        var entity = new DomainEntity
        {
            SomeValue = enhanced.Text ?? input.SomeValue
        };

        await _repository.AddAsync(entity);
        return entity;
    }
}

AI Extensions + Event Sourcing

Use Case: AI-generated events

public async Task ProcessEventAsync(DomainEvent domainEvent)
{
    // Use AI to analyze event
    var analysis = await _chatClient.GetResponseAsync([
        new ChatMessage(ChatRole.System, "Analyze domain events for patterns."),
        new ChatMessage(ChatRole.User, JsonSerializer.Serialize(domainEvent))
    ]);

    // Generate new event based on AI analysis
    var aiEvent = new AIAnalysisEvent
    {
        OriginalEvent = domainEvent,
        Analysis = analysis.Text
    };

    await _eventBus.PublishAsync(aiEvent);
}

AI Extensions + Actor Model

Use Case: Stateful AI agents

public class AIAgentGrain : Grain, IAIAgentGrain
{
    private readonly IChatClient _chatClient;
    private IPersistentState<AgentState> _state;

    public async Task<string> ProcessMessageAsync(string message)
    {
        // Maintain conversation context
        var messages = new List<ChatMessage>
        {
            new ChatMessage(ChatRole.System, "You are a helpful agent."),
        };

        // Add conversation history
        foreach (var history in _state.State.ConversationHistory)
        {
            messages.Add(history);
        }

        messages.Add(new ChatMessage(ChatRole.User, message));

        var response = await _chatClient.GetResponseAsync(messages);

        // Update state
        _state.State.ConversationHistory.Add(new ChatMessage(ChatRole.User, message));
        _state.State.ConversationHistory.Add(new ChatMessage(ChatRole.Assistant, response.Text ?? string.Empty));
        await _state.WriteStateAsync();

        return response.Text ?? string.Empty;
    }
}

Troubleshooting

Issue: Chat Client Not Resolved

Symptom: InvalidOperationException: No service for type 'IChatClient' has been registered.

Solutions: 1. Verify provider is configured in appsettings.json or configuration 2. Use keyed service resolution: GetRequiredKeyedService<IChatClient>("openAI") or "azureOpenAI" or "azureAIInference" or "ollama" 3. Check that chat client registration code is executed during application startup 4. For Azure AI Inference, verify model ID and endpoint are correct in configuration 5. Ensure the provider's NuGet package is installed

Issue: API Key Invalid

Symptom: UnauthorizedAccessException or 401 errors.

Solutions: 1. Verify API key in configuration 2. Check API key permissions and expiration 3. Ensure endpoint URLs are correct 4. For Azure OpenAI, verify deployment name matches 5. For Azure AI Inference, verify model ID and endpoint 6. Check environment variables are set correctly (if using) 7. Verify API key is not expired or revoked

Issue: Rate Limiting

Symptom: RateLimitExceededException or 429 errors.

Solutions: 1. Implement rate limiting with SemaphoreSlim 2. Use distributed caching to reduce API calls 3. Implement exponential backoff retry policy 4. Consider using multiple API keys for load distribution 5. Monitor rate limit headers in responses 6. Use streaming responses for better rate limit management 7. Implement request queuing for high-volume scenarios

Issue: Tool Invocation Not Working

Symptom: AI model doesn't call tools.

Solutions: 1. Verify ToolMode is set correctly (RequireAny or Auto) 2. Ensure tool descriptions are clear and accurate 3. Check that tool parameters match expected schema 4. Verify UseFunctionInvocation() is called during registration 5. Ensure tool names don't conflict with reserved keywords 6. Check that system prompts encourage tool usage 7. Verify tool function signatures match expected format

Issue: Embeddings Generation Fails

Symptom: Embedding generation throws exceptions or returns empty results.

Solutions: 1. Verify embedding model is available for the provider 2. Check input text is not empty or too long (model limits) 3. Ensure embedding generator is registered correctly 4. Verify API key has embedding generation permissions 5. Check token limits for embedding models 6. For batch operations, implement batching with proper limits

Issue: Streaming Responses Not Working

Symptom: Streaming responses don't stream or return all at once.

Solutions: 1. Verify streaming is enabled in provider configuration 2. Use GetStreamingResponseAsync() instead of GetResponseAsync() 3. Check that middleware supports streaming (ASP.NET Core) 4. Ensure client supports Server-Sent Events (SSE) or similar 5. Verify network/proxy doesn't buffer streaming responses 6. Check for async enumeration issues in consuming code

Issue: High Latency

Symptom: AI responses take too long to return.

Solutions: 1. Use streaming responses for better perceived performance 2. Implement caching for repeated requests 3. Use smaller/faster models when appropriate 4. Optimize prompt length and complexity 5. Consider using Ollama for local inference (lower latency) 6. Implement request timeout and cancellation tokens 7. Monitor OpenTelemetry traces for bottleneck identification 8. Use connection pooling for HTTP clients

Issue: Evaluation Not Working

Symptom: Evaluation results are not generated or are incorrect.

Solutions: 1. Verify evaluation packages are installed (Microsoft.Extensions.AI.Evaluation, Microsoft.Extensions.AI.Evaluation.Quality, Microsoft.Extensions.AI.Evaluation.NLP) 2. Check that AIEvaluation.Enabled is set to true in configuration 3. Verify that IEvaluator instances are registered in DI (check services.GetServices<IEvaluator>()) 4. Ensure ChatConfiguration is registered (required for quality evaluators) 5. Verify that at least one IChatClient is available (required for ChatConfiguration) 6. Check that evaluators are being called with correct parameters: evaluator.EvaluateAsync(query, generated, chatConfiguration) 7. Ensure evaluation request format is correct (query and generated text should not be empty for quality evaluators) 8. Verify metrics are supported by the evaluator (check available metrics in EvaluationResult.Metrics) 9. Check evaluation logs for errors 10. For quality evaluators, ensure the query and generated text are provided 11. For NLP evaluators, ensure reference text is available if needed

Issue: Cost Management

Symptom: Unexpected high costs from AI API usage.

Solutions: 1. Implement caching for repeated requests 2. Monitor token usage via OpenTelemetry metrics 3. Use smaller models when appropriate 4. Implement request rate limiting 5. Set up cost alerts in cloud provider 6. Use Ollama for local inference (no per-request cost) 7. Optimize prompts to reduce token usage 8. Implement request deduplication

Issue: Azure AI Inference Not Available

Symptom: Azure AI Inference provider not working.

Solutions: 1. Check that Microsoft.Extensions.AI.AzureAIInference package is installed 2. Ensure endpoint and model ID are correct in configuration 3. Check Azure AI Model Catalog for available models 4. Verify API key is configured correctly 5. Ensure Azure.Core package is installed 6. Check that the keyed service can be resolved: GetRequiredKeyedService<IChatClient>("azureAIInference") 7. Verify the registration code is properly implemented

Summary

Microsoft.Extensions.AI provides:

  • Unified Abstraction: Single interface for multiple AI providers
  • Provider Agnostic: Switch between OpenAI, Azure OpenAI, Azure AI Inference, and Ollama
  • Function Calling: Built-in support for AI tool invocation
  • Embeddings: Text embedding generation for semantic search and similarity
  • Evaluation: Quality and NLP evaluation for AI responses
  • Observability: OpenTelemetry and logging integration
  • Caching: Distributed caching for performance and cost reduction
  • Streaming: Real-time streaming responses for better UX
  • Type Safety: Strongly-typed abstractions
  • Dependency Injection: Seamless integration with .NET DI

By following these patterns, you can:

  • Build AI-Powered Applications: Integrate AI capabilities into .NET applications
  • Switch Providers Easily: Change AI providers without code changes
  • Reduce Costs: Use caching, rate limiting, and model selection effectively
  • Monitor Performance: Leverage OpenTelemetry for observability
  • Evaluate Quality: Assess AI response quality and compliance
  • Implement Semantic Search: Use embeddings for intelligent search capabilities
  • Handle Errors Gracefully: Implement retry logic and error handling
  • Optimize for Performance: Use streaming, caching, and appropriate models

Microsoft.Extensions.AI ensures that AI capabilities are integrated in a clean, maintainable, and testable way, enabling developers to build intelligent applications that leverage the power of modern AI models.

References