Latency Telemetry in ConnectSoft Microservice Template¶
Purpose & Overview¶
Latency Telemetry in the ConnectSoft Microservice Template provides comprehensive measurement and monitoring of request and operation latencies across all layers of the microservice. Latency telemetry enables teams to understand performance characteristics, identify bottlenecks, track service-level objectives (SLOs), and optimize application performance.
Latency telemetry provides:
- Automatic Instrumentation: Automatic latency tracking for HTTP requests, database operations, and dependencies
- Manual Measurements: Custom latency tracking for business operations and critical paths
- Distributed Tracing: End-to-end latency visibility across service boundaries
- Performance Metrics: Histogram-based latency metrics with percentile analysis
- Real-Time Monitoring: Live latency metrics for immediate performance insights
- SLO Tracking: Service-level objective monitoring based on latency thresholds
Latency Telemetry Philosophy
Latency is a critical performance indicator that directly impacts user experience and system reliability. The ConnectSoft template provides automatic latency tracking for all major operations while enabling custom latency measurements for business-critical operations. Latency telemetry is collected automatically through OpenTelemetry and Application Insights, providing comprehensive visibility without manual instrumentation overhead.
Architecture Overview¶
Latency Telemetry Stack¶
Application Operations
↓
Automatic Instrumentation
├── HTTP Requests (ASP.NET Core)
├── Database Operations (NHibernate, MongoDB)
├── HTTP Client Calls
├── gRPC Calls
├── Message Bus (MassTransit, NServiceBus)
├── Cache Operations (Redis)
└── SignalR Connections
↓
Latency Measurement
├── Duration Tracking (OpenTelemetry Activities)
├── Histogram Metrics (OpenTelemetry Metrics)
├── Request Telemetry (Application Insights)
└── Custom Timing (Stopwatch)
↓
Observability Backends
├── OpenTelemetry Collector
├── Application Insights
├── Prometheus
└── Seq
Latency Measurement Points¶
| Measurement Point | Instrumentation | Latency Captured |
|---|---|---|
| HTTP Requests | ASP.NET Core | End-to-end request processing time |
| Database Queries | SQL Client, MongoDB | Query execution time |
| HTTP Client Calls | HttpClient | Outgoing HTTP request duration |
| gRPC Calls | gRPC Client/Server | gRPC method execution time |
| Message Publishing | MassTransit, NServiceBus | Message send/receive latency |
| Cache Operations | Redis | Cache get/set latency |
| SignalR | SignalR Hub | Message send/receive latency |
| Business Operations | Custom Activities | Custom operation duration |
Automatic Latency Tracking¶
HTTP Request Latency¶
ASP.NET Core Automatic Instrumentation:
HTTP request latency is automatically tracked by OpenTelemetry:
// OpenTelemetryExtensions.cs
tracingBuilder
.AddAspNetCoreInstrumentation(options =>
{
options.EnrichWithHttpRequest = EnrichWithHttpRequest;
options.EnrichWithHttpResponse = EnrichWithHttpResponse;
options.RecordException = true;
});
Captured Latency Metrics: - Total Request Duration: Time from request received to response sent - Middleware Pipeline Time: Time spent in middleware - Controller Action Time: Time spent in controller action - Response Time: Time to generate and send response
Example Trace:
Trace: HTTP Request
├── Span: GET /api/aggregates
│ Duration: 123.45ms
│ Status: 200 OK
│ ├── Span: Controller Action
│ │ Duration: 45.23ms
│ │ ├── Span: Database Query
│ │ │ Duration: 32.10ms
│ │ └── Span: Response Serialization
│ │ Duration: 13.13ms
Database Operation Latency¶
SQL Client Instrumentation:
// OpenTelemetryExtensions.cs
#if UseNHibernate
tracingBuilder
.AddSqlClientInstrumentation(options =>
{
options.SetDbStatementForText = true;
options.RecordException = true;
options.Enrich = (activity, eventName, rawObject) =>
{
if (eventName.Equals("OnCustom", StringComparison.Ordinal) && rawObject is SqlCommand msCmd)
{
activity.SetTag("db.commandTimeout", msCmd.CommandTimeout);
}
};
});
#endif
Captured Metrics: - Query Execution Time: Time to execute SQL query - Connection Time: Time to establish database connection - Transaction Time: Time for transaction commit/rollback
Example Span:
Span: Database Query
├── Operation: SELECT
├── Duration: 32.10ms
├── Database: MyDatabase
├── Command Timeout: 30s
└── Status: Success
HTTP Client Latency¶
HttpClient Instrumentation:
// OpenTelemetryExtensions.cs
tracingBuilder
.AddHttpClientInstrumentation(options =>
{
options.FilterHttpRequestMessage = message =>
message is not null &&
message.RequestUri is not null &&
!message.RequestUri.Host.Contains("visualstudio", StringComparison.Ordinal) &&
!message.RequestUri.Host.Contains("applicationinsights", StringComparison.Ordinal);
});
Captured Metrics: - Request Duration: Time from request sent to response received - Connection Time: Time to establish connection - DNS Resolution Time: Time to resolve hostname - TLS Handshake Time: Time for SSL/TLS negotiation
gRPC Call Latency¶
gRPC Instrumentation:
Captured Metrics: - gRPC Method Duration: Time for gRPC method execution - Network Latency: Time for network round-trip - Serialization Time: Time to serialize/deserialize messages
Message Bus Latency¶
MassTransit Instrumentation:
// OpenTelemetryExtensions.cs
#if UseMassTransit
tracingBuilder
.AddSource(MassTransit.Logging.DiagnosticHeaders.DefaultListenerName);
#endif
Captured Metrics: - Message Send Latency: Time to send message to broker - Message Receive Latency: Time from message available to handler start - Handler Execution Time: Time for message handler execution - End-to-End Latency: Time from send to complete processing
Cache Operation Latency¶
Redis Instrumentation:
// OpenTelemetryExtensions.cs
#if DistributedCacheRedis
tracingBuilder
.AddRedisInstrumentation();
#endif
Captured Metrics: - Cache Get Latency: Time to retrieve from cache - Cache Set Latency: Time to store in cache - Cache Delete Latency: Time to remove from cache - Network Round-Trip: Time for Redis network communication
Manual Latency Measurement¶
Using Stopwatch¶
Basic Timing:
public class AggregateService
{
public async Task<Aggregate> CreateAggregateAsync(CreateInput input)
{
var stopwatch = Stopwatch.StartNew();
try
{
// Business logic
var aggregate = await ProcessCreateAsync(input);
// Log latency
_logger.LogInformation(
"CreateAggregate completed in {Duration}ms",
stopwatch.ElapsedMilliseconds);
return aggregate;
}
finally
{
stopwatch.Stop();
}
}
}
Using Statement:
public async Task<Aggregate> CreateAggregateAsync(CreateInput input)
{
using var _ = _logger.BeginScope(new Dictionary<string, object>
{
["Operation"] = "CreateAggregate"
});
var stopwatch = Stopwatch.StartNew();
try
{
var aggregate = await ProcessCreateAsync(input);
return aggregate;
}
finally
{
_logger.LogInformation(
"CreateAggregate completed in {Duration}ms",
stopwatch.ElapsedMilliseconds);
}
}
Using OpenTelemetry Activities¶
Custom Activity for Latency Tracking:
using System.Diagnostics;
public class AggregateService
{
private static readonly ActivitySource ActivitySource = new("ConnectSoft.MicroserviceTemplate");
public async Task<Aggregate> CreateAggregateAsync(CreateInput input)
{
using var activity = ActivitySource.StartActivity("CreateAggregate");
activity?.SetTag("aggregate.id", input.Id.ToString());
activity?.SetTag("aggregate.name", input.Name);
try
{
var aggregate = await ProcessCreateAsync(input);
activity?.SetTag("aggregate.created.id", aggregate.Id.ToString());
activity?.SetStatus(ActivityStatusCode.Ok);
return aggregate;
}
catch (Exception ex)
{
activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
activity?.RecordException(ex);
throw;
}
}
}
Activity Timing: - Activities automatically track duration - Duration is included in span/activity telemetry - Can be queried in observability backends
Using OpenTelemetry Metrics (Histogram)¶
Custom Latency Histogram:
using System.Diagnostics.Metrics;
public class AggregateServiceMetrics
{
private readonly Histogram<double> createDuration;
private readonly Histogram<double> createInputSize;
public AggregateServiceMetrics(IMeterFactory meterFactory)
{
var meter = meterFactory.Create("ConnectSoft.MicroserviceTemplate.AggregateService");
this.createDuration = meter.CreateHistogram<double>(
"aggregate.create.duration",
unit: "ms",
description: "Duration of aggregate creation operations");
this.createInputSize = meter.CreateHistogram<double>(
"aggregate.create.input.size",
unit: "bytes",
description: "Size of aggregate creation input");
}
public void RecordCreateDuration(double durationMs, string? objectId = null)
{
var tags = new TagList();
if (objectId != null)
{
tags.Add("object.id", objectId);
}
this.createDuration.Record(durationMs, tags);
}
public void RecordCreateInputSize(long sizeBytes, string? objectId = null)
{
var tags = new TagList();
if (objectId != null)
{
tags.Add("object.id", objectId);
}
this.createInputSize.Record(sizeBytes, tags);
}
}
// Usage
public class AggregateService
{
private readonly AggregateServiceMetrics metrics;
public async Task<Aggregate> CreateAggregateAsync(CreateInput input)
{
var stopwatch = Stopwatch.StartNew();
try
{
var aggregate = await ProcessCreateAsync(input);
// Record latency metric
this.metrics.RecordCreateDuration(
stopwatch.Elapsed.TotalMilliseconds,
aggregate.Id.ToString());
return aggregate;
}
catch (Exception ex)
{
// Record latency even on failure
this.metrics.RecordCreateDuration(
stopwatch.Elapsed.TotalMilliseconds,
input.Id?.ToString());
throw;
}
}
}
Histogram Benefits: - Percentiles: Calculate p50, p95, p99 latencies - Aggregation: Automatic aggregation across time periods - Distribution: Understand latency distribution - SLO Tracking: Monitor against latency thresholds
Using Application Insights TelemetryClient¶
Custom Dependency Telemetry:
public class AggregateService
{
private readonly TelemetryClient telemetryClient;
public async Task<Aggregate> CreateAggregateAsync(CreateInput input)
{
using var operation = telemetryClient.StartOperation<DependencyTelemetry>("CreateAggregate");
operation.Telemetry.Type = "CustomOperation";
operation.Telemetry.Data = $"Creating aggregate: {input.Name}";
try
{
var aggregate = await ProcessCreateAsync(input);
operation.Telemetry.Success = true;
operation.Telemetry.Properties["AggregateId"] = aggregate.Id.ToString();
return aggregate;
}
catch (Exception ex)
{
operation.Telemetry.Success = false;
telemetryClient.TrackException(ex);
throw;
}
}
}
Custom Metric Tracking:
public async Task<Aggregate> CreateAggregateAsync(CreateInput input)
{
var stopwatch = Stopwatch.StartNew();
try
{
var aggregate = await ProcessCreateAsync(input);
// Track latency metric
telemetryClient.TrackMetric(
"Aggregate.Create.Duration",
stopwatch.Elapsed.TotalMilliseconds,
new Dictionary<string, string>
{
["AggregateId"] = aggregate.Id.ToString(),
["Operation"] = "Create"
});
return aggregate;
}
catch (Exception ex)
{
// Track latency even on failure
telemetryClient.TrackMetric(
"Aggregate.Create.Duration",
stopwatch.Elapsed.TotalMilliseconds,
new Dictionary<string, string>
{
["Operation"] = "Create",
["Success"] = "false"
});
throw;
}
}
Latency Metrics Types¶
Histogram Metrics¶
Purpose: Track distribution of latency values
private readonly Histogram<double> requestDuration;
public RequestMetrics(IMeterFactory meterFactory)
{
var meter = meterFactory.Create("ConnectSoft.MicroserviceTemplate");
this.requestDuration = meter.CreateHistogram<double>(
"http.request.duration",
unit: "ms",
description: "HTTP request duration");
}
public void RecordRequestDuration(double durationMs, string method, string path, int statusCode)
{
this.requestDuration.Record(
durationMs,
new TagList
{
{ "http.method", method },
{ "http.route", path },
{ "http.status_code", statusCode.ToString() }
});
}
Benefits: - Calculate percentiles (p50, p95, p99, p999) - Understand latency distribution - Track SLO compliance - Identify outliers
Counter Metrics¶
Purpose: Count operations within latency buckets
private readonly Counter<long> fastRequests;
private readonly Counter<long> slowRequests;
public RequestMetrics(IMeterFactory meterFactory)
{
var meter = meterFactory.Create("ConnectSoft.MicroserviceTemplate");
this.fastRequests = meter.CreateCounter<long>("http.request.fast");
this.slowRequests = meter.CreateCounter<long>("http.request.slow");
}
public void RecordRequest(double durationMs, string method, string path)
{
if (durationMs < 100)
{
this.fastRequests.Add(1, new("http.method", method), new("http.route", path));
}
else if (durationMs > 1000)
{
this.slowRequests.Add(1, new("http.method", method), new("http.route", path));
}
}
Gauge Metrics¶
Purpose: Track current latency state
private readonly ObservableGauge<double> currentLatency;
public RequestMetrics(IMeterFactory meterFactory)
{
var meter = meterFactory.Create("ConnectSoft.MicroserviceTemplate");
this.currentLatency = meter.CreateObservableGauge<double>(
"http.request.current_latency",
unit: "ms",
description: "Current request latency",
observeValues: ObserveCurrentLatency);
}
private Measurement<double> ObserveCurrentLatency()
{
// Calculate current latency (e.g., from sliding window)
var current = CalculateCurrentLatency();
return new Measurement<double>(current);
}
Latency Analysis¶
Percentiles¶
Common Percentiles: - p50 (Median): 50% of requests faster than this value - p95: 95% of requests faster than this value - p99: 99% of requests faster than this value - p999: 99.9% of requests faster than this value
Example:
Request Duration Percentiles:
├── p50: 45ms (median)
├── p95: 120ms (95% of requests)
├── p99: 250ms (99% of requests)
└── p999: 500ms (99.9% of requests)
Querying Percentiles (Prometheus):
# p95 latency
histogram_quantile(0.95, rate(http_request_duration_bucket[5m]))
# p99 latency
histogram_quantile(0.99, rate(http_request_duration_bucket[5m]))
Average and Mean¶
Average Latency: - Simple average of all latency values - Can be skewed by outliers
Mean Latency: - Statistical mean of latency distribution - Better for understanding typical performance
Example Query (Application Insights):
requests
| summarize avg(duration), percentile(duration, 50), percentile(duration, 95), percentile(duration, 99)
| by name
Latency Distribution¶
Histogram Buckets:
private readonly Histogram<double> requestDuration;
public RequestMetrics(IMeterFactory meterFactory)
{
var meter = meterFactory.Create("ConnectSoft.MicroserviceTemplate");
this.requestDuration = meter.CreateHistogram<double>(
"http.request.duration",
unit: "ms",
description: "HTTP request duration");
// Histogram automatically creates buckets (default buckets)
// Or specify custom buckets via exporter configuration
}
Distribution Analysis: - Understand latency spread - Identify multiple latency modes - Detect performance regressions
Service-Level Objectives (SLOs)¶
Latency SLO Definition¶
Example SLO: - Target: 95% of requests complete in < 200ms - Measurement: p95 latency < 200ms - Window: 30-day rolling window - Alert: Alert if SLO violation for 5 minutes
SLO Tracking¶
Custom Metrics for SLO:
public class SloMetrics
{
private readonly Histogram<double> requestDuration;
private readonly Counter<long> totalRequests;
private readonly Counter<long> slowRequests;
public SloMetrics(IMeterFactory meterFactory)
{
var meter = meterFactory.Create("ConnectSoft.MicroserviceTemplate.SLO");
this.requestDuration = meter.CreateHistogram<double>(
"http.request.duration",
unit: "ms");
this.totalRequests = meter.CreateCounter<long>("http.request.total");
this.slowRequests = meter.CreateCounter<long>("http.request.slow");
}
public void RecordRequest(double durationMs, string endpoint)
{
const double sloThresholdMs = 200;
this.requestDuration.Record(durationMs, new("endpoint", endpoint));
this.totalRequests.Add(1, new("endpoint", endpoint));
if (durationMs > sloThresholdMs)
{
this.slowRequests.Add(1, new("endpoint", endpoint));
}
}
}
SLO Calculation (Prometheus):
# SLO compliance: (total - slow) / total
1 - (
sum(rate(http_request_slow_total[5m]))
/
sum(rate(http_request_total[5m]))
)
SLO Alerts¶
Alert Configuration (Prometheus):
groups:
- name: latency_slo
rules:
- alert: HighLatency
expr: |
histogram_quantile(0.95, rate(http_request_duration_bucket[5m])) > 0.2
for: 5m
annotations:
summary: "p95 latency exceeds SLO threshold"
description: "p95 latency is {{ $value }}s, exceeding 200ms threshold"
Performance Optimization¶
Identifying Bottlenecks¶
Latency Breakdown:
Total Request Duration: 500ms
├── Middleware: 5ms (1%)
├── Controller: 10ms (2%)
├── Application Service: 50ms (10%)
├── Database Query: 400ms (80%) ← Bottleneck
├── Response Serialization: 30ms (6%)
└── Network: 5ms (1%)
Analysis Steps: 1. Identify slowest operations 2. Analyze latency distribution 3. Compare against baselines 4. Profile code paths 5. Optimize hot paths
Latency Profiling¶
Using Activity Tags:
using var activity = ActivitySource.StartActivity("ProcessOrder");
var stopwatch = Stopwatch.StartNew();
// Step 1: Validate
var validateStopwatch = Stopwatch.StartNew();
await ValidateOrderAsync(order);
activity?.SetTag("step.validate.duration_ms", validateStopwatch.ElapsedMilliseconds);
// Step 2: Process Payment
var paymentStopwatch = Stopwatch.StartNew();
await ProcessPaymentAsync(order);
activity?.SetTag("step.payment.duration_ms", paymentStopwatch.ElapsedMilliseconds);
// Step 3: Create Order
var createStopwatch = Stopwatch.StartNew();
await CreateOrderAsync(order);
activity?.SetTag("step.create.duration_ms", createStopwatch.ElapsedMilliseconds);
activity?.SetTag("total.duration_ms", stopwatch.ElapsedMilliseconds);
Database Query Optimization¶
Query Latency Tracking:
// Track slow queries
private void TrackSlowQuery(string query, double durationMs)
{
if (durationMs > 1000) // 1 second threshold
{
_logger.LogWarning(
"Slow query detected: {Query} took {Duration}ms",
query,
durationMs);
_telemetryClient.TrackMetric(
"database.query.slow",
durationMs,
new Dictionary<string, string>
{
["Query"] = query,
["Threshold"] = "1000ms"
});
}
}
Best Practices¶
Do's¶
-
Track Latency for Critical Operations
-
Use Histograms for Latency Metrics
-
Include Context in Latency Metrics
-
Track Latency Even on Failures
-
Set Appropriate SLO Targets
Don'ts¶
-
Don't Track Every Operation
// ❌ BAD - Too granular, performance impact foreach (var item in items) { var sw = Stopwatch.StartNew(); ProcessItem(item); TrackLatency(sw.ElapsedMilliseconds); // Too fine-grained } // ✅ GOOD - Track at appropriate level var sw = Stopwatch.StartNew(); foreach (var item in items) { ProcessItem(item); } TrackLatency(sw.ElapsedMilliseconds); -
Don't Use Simple Averages
-
Don't Ignore Outliers
-
Don't Block on Latency Tracking
Integration with Observability¶
OpenTelemetry Export¶
OTLP Export:
// OpenTelemetryExtensions.cs
tracingBuilder.AddOtlpExporter(options =>
{
options.Protocol = OtlpExportProtocol.Grpc;
options.Endpoint = new Uri("https://otel-collector:4317");
});
Latency in Traces: - Activity duration automatically included - Span timing in distributed traces - Queryable in Jaeger, Zipkin, etc.
Application Insights¶
Request Telemetry:
Application Insights automatically tracks request latency: - Duration included in request telemetry - Queryable via Kusto queries - Visualizable in dashboards
Custom Metrics:
telemetryClient.TrackMetric(
"Operation.Duration",
durationMs,
new Dictionary<string, string>
{
["Operation"] = "CreateAggregate",
["Environment"] = "Production"
});
Prometheus¶
Prometheus Histogram:
// Histogram metrics automatically exported to Prometheus
// Queryable via PromQL
histogram_quantile(0.95, rate(http_request_duration_bucket[5m]))
Custom Prometheus Metrics:
Troubleshooting¶
Issue: High Latency Spikes¶
Symptoms: Sudden latency increases.
Solutions: 1. Check for slow database queries 2. Review external dependency latency 3. Check for resource contention (CPU, memory) 4. Analyze garbage collection patterns 5. Review network latency
Issue: Inconsistent Latency¶
Symptoms: High variance in latency measurements.
Solutions: 1. Check for cold start effects 2. Review caching effectiveness 3. Analyze connection pool exhaustion 4. Check for thread pool starvation 5. Review load balancing
Issue: Missing Latency Data¶
Symptoms: Latency metrics not appearing.
Solutions: 1. Verify instrumentation is enabled 2. Check OpenTelemetry configuration 3. Verify exporters are configured 4. Check network connectivity to observability backends 5. Review sampling configuration
Summary¶
Latency Telemetry in the ConnectSoft Microservice Template provides:
- ✅ Automatic Instrumentation: Automatic latency tracking for all major operations
- ✅ Manual Measurements: Custom latency tracking for business operations
- ✅ Distributed Tracing: End-to-end latency visibility across services
- ✅ Histogram Metrics: Percentile-based latency analysis
- ✅ SLO Tracking: Service-level objective monitoring
- ✅ Performance Optimization: Bottleneck identification and optimization
- ✅ Multiple Backends: OpenTelemetry, Application Insights, Prometheus support
- ✅ Best Practices: Comprehensive latency measurement patterns
By following these patterns, teams can:
- Monitor Performance: Track latency across all operations
- Identify Bottlenecks: Find and optimize slow operations
- Maintain SLOs: Ensure service-level objectives are met
- Optimize Code: Make data-driven performance improvements
- Debug Issues: Understand latency patterns in production
- Plan Capacity: Use latency metrics for capacity planning
Latency telemetry is a critical component of observability, enabling teams to understand, monitor, and optimize application performance to deliver excellent user experiences.