Skip to content

Grafana LGTM Stack Setup Example

This example demonstrates how to set up and configure the Grafana LGTM Stack (Loki, Grafana, Tempo, Mimir) with OpenTelemetry Collector for a cost-effective, comprehensive observability solution.

Overview

The Grafana LGTM Stack provides a complete observability platform:

  • Loki: Log aggregation system (similar to Prometheus but for logs)
  • Grafana: Visualization and dashboarding platform
  • Tempo: Distributed tracing backend
  • Mimir: Long-term metrics storage (Prometheus-compatible)

This stack is designed to be:

  • Cost-effective: Open-source, efficient storage
  • Scalable: Handles high-volume telemetry
  • Integrated: All components work seamlessly together
  • OpenTelemetry-native: Built for OpenTelemetry from the ground up

Prerequisites

  • Docker Desktop installed and running
  • At least 8GB of available RAM (LGTM stack is resource-intensive)
  • Ports available: 3000 (Grafana), 3100 (Loki), 3200 (Tempo), 9009 (Mimir)

Step 1: Enable LGTM Stack Configuration

Update your application configuration (e.g., appsettings.json):

{
  "ObservabilityStack": {
    "Mode": "OtelCollector",
    "CollectorEndpoint": "http://otel-collector:4317"
  },
  "ObservabilityBackend": {
    "EnabledBackends": ["GrafanaLGTM"],
    "GrafanaLGTM": {
      "LokiEndpoint": "http://loki:3100",
      "TempoEndpoint": "http://tempo:3200",
      "MimirEndpoint": "http://mimir:9009",
      "GrafanaEndpoint": "http://grafana:3000"
    }
  }
}

Step 2: Start the LGTM Stack

# Navigate to Docker Compose directory
cd <your-docker-compose-directory>

# Start LGTM stack and collector
docker-compose up -d loki grafana tempo mimir otel-collector

Docker Compose Configuration

Example docker-compose.yml for LGTM stack:

version: '3.8'

services:
  # Loki - Log aggregation
  loki:
    image: grafana/loki:latest
    ports:
      - "3100:3100"
    command: -config.file=/etc/loki/local-config.yaml
    volumes:
      - loki-data:/loki

  # Grafana - Visualization
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
    volumes:
      - grafana-data:/var/lib/grafana
    depends_on:
      - loki
      - tempo
      - mimir

  # Tempo - Distributed tracing
  tempo:
    image: grafana/tempo:latest
    ports:
      - "3200:3200"
      - "4317:4317"  # OTLP gRPC
      - "4318:4318"  # OTLP HTTP
    command: ["-config.file=/etc/tempo/tempo.yaml"]
    volumes:
      - tempo-data:/var/tempo

  # Mimir - Metrics storage
  mimir:
    image: grafana/mimir:latest
    ports:
      - "9009:9009"
      - "8080:8080"
    command: ["-config.file=/etc/mimir/mimir.yaml"]
    volumes:
      - mimir-data:/data

  # OpenTelemetry Collector
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    ports:
      - "4317:4317"  # OTLP gRPC
      - "4318:4318"  # OTLP HTTP
    volumes:
      - ./otel-collector-config-lgtm.yaml:/etc/otelcol/config.yaml
    depends_on:
      - loki
      - tempo
      - mimir

volumes:
  loki-data:
  grafana-data:
  tempo-data:
  mimir-data:

Step 3: Configure OpenTelemetry Collector

Create otel-collector-config-lgtm.yaml:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024
  memory_limiter:
    limit_mib: 400
    spike_limit_mib: 100
    check_interval: 5s
  resource:
    attributes:
      - key: service.name
        value: MyApplication
        action: upsert
      - key: deployment.environment
        value: ${ENVIRONMENT:production}
        action: upsert

exporters:
  # Loki for logs
  loki:
    endpoint: http://loki:3100
    labels:
      resource:
        service.name: "service_name"
        deployment.environment: "deployment_environment"
      attributes:
        http.method: "http_method"
        http.status_code: "http_status_code"

  # Tempo for traces
  otlp/tempo:
    endpoint: tempo:3200
    tls:
      insecure: true

  # Prometheus Remote Write for Mimir (metrics)
  prometheusremotewrite:
    endpoint: http://mimir:9009/api/v1/push
    external_labels:
      cluster: production
      environment: ${ENVIRONMENT:production}

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, resource, batch]
      exporters: [otlp/tempo]

    metrics:
      receivers: [otlp]
      processors: [memory_limiter, resource, batch]
      exporters: [prometheusremotewrite]

    logs:
      receivers: [otlp]
      processors: [memory_limiter, resource, batch]
      exporters: [loki]

Step 4: Configure Grafana Data Sources

Add Loki Data Source

  1. Open http://localhost:3000 in your browser
  2. Go to Configuration > Data Sources
  3. Click Add data source
  4. Select Loki
  5. Set URL to http://loki:3100
  6. Click Save & Test

Add Tempo Data Source

  1. Click Add data source
  2. Select Tempo
  3. Set URL to http://tempo:3200
  4. Under Trace to logs, select the Loki data source
  5. Click Save & Test

Add Mimir Data Source

  1. Click Add data source
  2. Select Prometheus
  3. Set URL to http://mimir:9009/prometheus
  4. Click Save & Test

Step 5: Configure Loki

Create loki-config.yaml:

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://localhost:9093

# By default, Loki will send anonymous, but uniquely-identifiable usage and configuration
# analytics to Grafana Labs. These statistics are sent to https://stats.grafana.org/
#
# Statistics help us better understand how Loki is used, and they show us performance
# levels for most users. This helps us prioritize features and documentation.
# For more information on what's sent, look at
# https://github.com/grafana/loki/blob/main/pkg/analytics/stats.go
# Refer to the buildReport method to see what goes into a report.
#
# If you would like to disable reporting, uncomment the following lines:
#analytics:
#  reporting_enabled: false

Step 6: Configure Tempo

Create tempo-config.yaml:

server:
  http_listen_port: 3200

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318

ingester:
  max_block_duration: 5m

compactor:
  compaction:
    block_retention: 1h
    compacted_block_retention: 1h

storage:
  trace:
    backend: local
    local:
      path: /var/tempo/traces
    pool:
      max_workers: 100
      queue_depth: 10000

overrides:
  default:
    ingestion_burst_size_mb: 16
    ingestion_rate_limit_mb: 16

Step 7: Configure Mimir

Create mimir-config.yaml:

target: all

server:
  http_listen_port: 9009
  grpc_listen_port: 9095

distributor:
  pool:
    health_check_ingesters: true

ingester:
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: memberlist
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 1h
  max_chunk_age: 1h
  chunk_target_size: 1048576
  chunk_retain_period: 30s
  max_transfer_retries: 0

store_gateway:
  sharding_enabled: true

compactor:
  data_dir: /data/compactor
  sharding_enabled: true

storage:
  backend: filesystem
  filesystem:
    dir: /data/mimir

limits:
  ingestion_rate: 10000
  ingestion_burst_size: 20000
  max_global_series_per_user: 100000
  max_global_series_per_metric: 20000

Step 8: Verify Setup

Check Loki

curl http://localhost:3100/ready

Query logs:

curl -G -s "http://localhost:3100/loki/api/v1/query_range" \
  --data-urlencode 'query={service_name="MyApplication"}' \
  --data-urlencode 'start=1h' \
  --data-urlencode 'end=now'

Check Tempo

curl http://localhost:3200/ready

Check Mimir

curl http://localhost:9009/ready

Query metrics:

curl http://localhost:9009/prometheus/api/v1/query?query=up

Check Grafana

  1. Open http://localhost:3000
  2. Verify all data sources are connected
  3. Navigate to Explore to query data

Step 9: Create Grafana Dashboards

Logs Dashboard

  1. Go to Dashboards > New Dashboard
  2. Add panel with Loki query: {service_name="MyApplication"}
  3. Visualize logs over time

Traces Dashboard

  1. Add panel with Tempo query
  2. View trace timeline and spans
  3. Link traces to logs using trace ID

Metrics Dashboard

  1. Add panel with Prometheus query (Mimir): rate(http_server_request_duration_seconds_count[5m])
  2. Create graphs for key metrics
  3. Set up alerts

Correlated Observability

Grafana's Explore view allows you to:

  • Start from a trace, jump to related logs
  • Start from a log, jump to related traces
  • View metrics for the same time period
  • Correlate issues across all three pillars

Step 10: Generate Test Data

Start your application and generate telemetry:

# Make API calls to generate traces, metrics, and logs
for i in {1..20}; do
  curl http://localhost:8081/api/health
  sleep 0.5
done

Configuration Examples

Resource Attributes

Add custom resource attributes for better filtering:

processors:
  resource:
    attributes:
      - key: service.name
        value: MyApplication
        action: upsert
      - key: deployment.environment
        value: production
        action: upsert
      - key: service.version
        value: 1.0.0
        action: upsert
      - key: team.name
        value: platform-team
        action: upsert

Log Labeling

Configure Loki labels for efficient querying:

exporters:
  loki:
    endpoint: http://loki:3100
    labels:
      resource:
        service.name: "service_name"
        deployment.environment: "deployment_environment"
        service.version: "service_version"
      attributes:
        http.method: "http_method"
        http.status_code: "http_status_code"
        http.route: "http_route"

Trace Sampling

Reduce trace volume with sampling:

processors:
  probabilistic_sampler:
    sampling_percentage: 10.0

Metrics Aggregation

Configure Mimir for long-term storage:

exporters:
  prometheusremotewrite:
    endpoint: http://mimir:9009/api/v1/push
    external_labels:
      cluster: production
      environment: production
      region: us-east-1

Performance Optimization

Loki Optimization

  • Use label-based indexing (not full-text search)
  • Limit label cardinality
  • Configure retention policies
  • Use chunk compression

Tempo Optimization

  • Enable trace compression
  • Configure block retention
  • Use object storage for long-term retention
  • Enable trace sampling

Mimir Optimization

  • Configure ingestion limits
  • Use sharding for scale
  • Enable compression
  • Configure retention policies

Cost Considerations

The LGTM stack is designed to be cost-effective:

  • Efficient Storage: Compressed storage formats
  • No Vendor Lock-in: Open-source, self-hosted
  • Scalable: Handles high volume efficiently
  • Resource Efficient: Lower resource requirements than ELK stack

Storage Optimization

  • Configure retention policies
  • Use compression
  • Archive old data to object storage
  • Enable downsampling for long-term metrics

Troubleshooting

No Logs in Loki

Problem: Logs not appearing in Loki

Solution:

  1. Verify collector is exporting to Loki: Check collector logs
  2. Check Loki is running: curl http://localhost:3100/ready
  3. Verify label configuration matches query
  4. Check time range in Grafana

No Traces in Tempo

Problem: Traces not appearing in Tempo

Solution:

  1. Verify collector is exporting to Tempo
  2. Check Tempo is running: curl http://localhost:3200/ready
  3. Verify OTLP endpoint configuration
  4. Check trace sampling percentage

No Metrics in Mimir

Problem: Metrics not appearing in Mimir

Solution:

  1. Verify Prometheus Remote Write endpoint
  2. Check Mimir is running: curl http://localhost:9009/ready
  3. Verify metric format compatibility
  4. Check ingestion limits

High Resource Usage

Problem: Stack using too much memory/CPU

Solution:

  • Reduce retention periods
  • Enable sampling
  • Optimize label cardinality
  • Increase resource limits
  • Use object storage for long-term data

Use Cases

Comprehensive Observability

  • Single platform for metrics, logs, and traces
  • Correlated observability across all pillars
  • Unified dashboards and alerts

Cost-Effective Monitoring

  • Open-source solution
  • Efficient storage
  • No per-GB pricing
  • Self-hosted control

High-Volume Systems

  • Scalable architecture
  • Handles millions of metrics
  • Efficient log aggregation
  • Distributed trace storage

Multi-Tenant Environments

  • Tenant isolation
  • Resource quotas
  • Per-tenant dashboards
  • Access control

Further Reading