Skip to content

OpenTelemetry Collector

Overview

The OpenTelemetry Collector is a vendor-agnostic service that receives, processes, and exports telemetry data. It acts as a central hub for observability, decoupling applications from specific observability backends.

Architecture

Components

  1. Receivers: Receive telemetry data from various sources

    • OTLP (gRPC/HTTP)
    • Prometheus
    • Jaeger
    • Zipkin
  2. Processors: Process and transform telemetry data

    • Batch: Batch data for efficiency
    • Memory Limiter: Prevent memory exhaustion
    • Resource: Add/modify resource attributes
    • Attributes: Modify span/metric/log attributes
    • Sampling: Reduce data volume
  3. Exporters: Export data to backends

    • Grafana LGTM Stack (Loki, Tempo, Mimir)
    • Prometheus
    • Jaeger
    • Elasticsearch
    • Azure Monitor
    • And many more...
  4. Extensions: Provide additional functionality

    • Health Check: Health monitoring endpoint
    • zPages: Debugging interface
    • pprof: Performance profiling

Data Flow

Application → OTLP Receiver → Processors → Exporters → Backends

Benefits

  1. Decoupling: Applications don't need to know about specific backends
  2. Centralized Processing: Process data once, export to multiple backends
  3. Flexibility: Easy to add/remove backends without code changes
  4. Consistency: Standardized telemetry format across services
  5. Performance: Batch processing and sampling reduce overhead

Configuration

Basic Configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024

exporters:
  debug:
    verbosity: detailed
  prometheus:
    endpoint: 0.0.0.0:8889

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug, prometheus]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheus]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]

Deployment Patterns

Sidecar Pattern

Deploy collector as a sidecar container alongside each application instance.

Pros:

  • Isolation
  • Per-service configuration
  • No network latency

Cons:

  • Resource overhead
  • More complex deployment

Gateway Pattern

Deploy collector as a centralized gateway service.

Pros:

  • Resource efficient
  • Centralized configuration
  • Easier management

Cons:

  • Single point of failure (mitigate with HA)
  • Network latency

Agent + Gateway Pattern

Deploy lightweight agents on each host and a gateway in the cluster.

Pros:

  • Best of both worlds
  • Scalable
  • Flexible

Cons:

  • More complex architecture

Best Practices

Performance

  1. Batch Processing: Always use batch processor
  2. Memory Limits: Configure memory limiter
  3. Sampling: Use sampling for high-volume traces
  4. Resource Allocation: Allocate sufficient CPU/memory

Reliability

  1. High Availability: Deploy multiple collector instances
  2. Queuing: Enable exporter queues for resilience
  3. Retries: Configure retry policies
  4. Health Checks: Monitor collector health

Security

  1. TLS: Use TLS for all connections
  2. Authentication: Implement authentication for backends
  3. Network Policies: Restrict network access
  4. Secrets Management: Use secrets for sensitive data

Monitoring

Health Endpoint

curl http://localhost:13133/

Metrics Endpoint

curl http://localhost:8888/metrics

zPages

Access debugging interface at: http://localhost:55679

Troubleshooting

Common Issues

  1. High Memory Usage: Reduce batch sizes, enable sampling
  2. Data Loss: Enable exporter queues, check backend connectivity
  3. Slow Performance: Optimize processors, increase resources
  4. Configuration Errors: Validate config with --dry-run

Debugging

  1. Enable debug logging: LOG_LEVEL=debug
  2. Use debug exporter for local development
  3. Check zPages for internal state
  4. Monitor metrics endpoint

Examples

Further Reading