OpenTelemetry Collector¶

Overview¶

The OpenTelemetry Collector is a vendor-agnostic service that receives, processes, and exports telemetry data. It acts as a central hub for observability, decoupling applications from specific observability backends.

Architecture¶

Components¶

Receivers: Receive telemetry data from various sources
- OTLP (gRPC/HTTP)
- Prometheus
- Jaeger
- Zipkin
Processors: Process and transform telemetry data
- Batch: Batch data for efficiency
- Memory Limiter: Prevent memory exhaustion
- Resource: Add/modify resource attributes
- Attributes: Modify span/metric/log attributes
- Sampling: Reduce data volume
Exporters: Export data to backends
- Grafana LGTM Stack (Loki, Tempo, Mimir)
- Prometheus
- Jaeger
- Elasticsearch
- Azure Monitor
- And many more...
Extensions: Provide additional functionality
- Health Check: Health monitoring endpoint
- zPages: Debugging interface
- pprof: Performance profiling

Data Flow¶

Application → OTLP Receiver → Processors → Exporters → Backends

Benefits¶

Decoupling: Applications don't need to know about specific backends
Centralized Processing: Process data once, export to multiple backends
Flexibility: Easy to add/remove backends without code changes
Consistency: Standardized telemetry format across services
Performance: Batch processing and sampling reduce overhead

Configuration¶

Basic Configuration¶

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024

exporters:
  debug:
    verbosity: detailed
  prometheus:
    endpoint: 0.0.0.0:8889

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug, prometheus]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheus]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]

Deployment Patterns¶

Sidecar Pattern¶

Deploy collector as a sidecar container alongside each application instance.

Pros:

Isolation
Per-service configuration
No network latency

Cons:

Resource overhead
More complex deployment

Gateway Pattern¶

Deploy collector as a centralized gateway service.

Pros:

Resource efficient
Centralized configuration
Easier management

Cons:

Single point of failure (mitigate with HA)
Network latency

Agent + Gateway Pattern¶

Deploy lightweight agents on each host and a gateway in the cluster.

Pros:

Best of both worlds
Scalable
Flexible

Cons:

More complex architecture

Best Practices¶

Performance¶

Batch Processing: Always use batch processor
Memory Limits: Configure memory limiter
Sampling: Use sampling for high-volume traces
Resource Allocation: Allocate sufficient CPU/memory

Reliability¶

High Availability: Deploy multiple collector instances
Queuing: Enable exporter queues for resilience
Retries: Configure retry policies
Health Checks: Monitor collector health

Security¶

TLS: Use TLS for all connections
Authentication: Implement authentication for backends
Network Policies: Restrict network access
Secrets Management: Use secrets for sensitive data

Monitoring¶

Health Endpoint¶

curl http://localhost:13133/

Metrics Endpoint¶

curl http://localhost:8888/metrics

zPages¶

Access debugging interface at: http://localhost:55679

Troubleshooting¶

Common Issues¶

High Memory Usage: Reduce batch sizes, enable sampling
Data Loss: Enable exporter queues, check backend connectivity
Slow Performance: Optimize processors, increase resources
Configuration Errors: Validate config with --dry-run

Debugging¶

Enable debug logging: LOG_LEVEL=debug
Use debug exporter for local development
Check zPages for internal state
Monitor metrics endpoint

Examples¶

Grafana LGTM Stack Example: Complete setup with Loki, Tempo, and Mimir
Custom Processors Example: Custom processor configuration
Multiple Backends Example: Exporting to multiple backends

OpenTelemetry Collector¶

Overview¶

Architecture¶

Components¶

Data Flow¶

Benefits¶

Configuration¶

Basic Configuration¶

Deployment Patterns¶

Sidecar Pattern¶

Gateway Pattern¶

Agent + Gateway Pattern¶

Best Practices¶

Performance¶

Reliability¶

Security¶

Monitoring¶

Health Endpoint¶

Metrics Endpoint¶

zPages¶

Troubleshooting¶

Common Issues¶

Debugging¶

Examples¶

Further Reading¶