Grafana LGTM Stack Setup Example¶
This example demonstrates how to set up and configure the Grafana LGTM Stack (Loki, Grafana, Tempo, Mimir) with OpenTelemetry Collector for a cost-effective, comprehensive observability solution.
Overview¶
The Grafana LGTM Stack provides a complete observability platform:
- Loki: Log aggregation system (similar to Prometheus but for logs)
- Grafana: Visualization and dashboarding platform
- Tempo: Distributed tracing backend
- Mimir: Long-term metrics storage (Prometheus-compatible)
This stack is designed to be:
- Cost-effective: Open-source, efficient storage
- Scalable: Handles high-volume telemetry
- Integrated: All components work seamlessly together
- OpenTelemetry-native: Built for OpenTelemetry from the ground up
Prerequisites¶
- Docker Desktop installed and running
- At least 8GB of available RAM (LGTM stack is resource-intensive)
- Ports available: 3000 (Grafana), 3100 (Loki), 3200 (Tempo), 9009 (Mimir)
Step 1: Enable LGTM Stack Configuration¶
Update your application configuration (e.g., appsettings.json):
{
"ObservabilityStack": {
"Mode": "OtelCollector",
"CollectorEndpoint": "http://otel-collector:4317"
},
"ObservabilityBackend": {
"EnabledBackends": ["GrafanaLGTM"],
"GrafanaLGTM": {
"LokiEndpoint": "http://loki:3100",
"TempoEndpoint": "http://tempo:3200",
"MimirEndpoint": "http://mimir:9009",
"GrafanaEndpoint": "http://grafana:3000"
}
}
}
Step 2: Start the LGTM Stack¶
# Navigate to Docker Compose directory
cd <your-docker-compose-directory>
# Start LGTM stack and collector
docker-compose up -d loki grafana tempo mimir otel-collector
Docker Compose Configuration¶
Example docker-compose.yml for LGTM stack:
version: '3.8'
services:
# Loki - Log aggregation
loki:
image: grafana/loki:latest
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
volumes:
- loki-data:/loki
# Grafana - Visualization
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
volumes:
- grafana-data:/var/lib/grafana
depends_on:
- loki
- tempo
- mimir
# Tempo - Distributed tracing
tempo:
image: grafana/tempo:latest
ports:
- "3200:3200"
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
command: ["-config.file=/etc/tempo/tempo.yaml"]
volumes:
- tempo-data:/var/tempo
# Mimir - Metrics storage
mimir:
image: grafana/mimir:latest
ports:
- "9009:9009"
- "8080:8080"
command: ["-config.file=/etc/mimir/mimir.yaml"]
volumes:
- mimir-data:/data
# OpenTelemetry Collector
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
volumes:
- ./otel-collector-config-lgtm.yaml:/etc/otelcol/config.yaml
depends_on:
- loki
- tempo
- mimir
volumes:
loki-data:
grafana-data:
tempo-data:
mimir-data:
Step 3: Configure OpenTelemetry Collector¶
Create otel-collector-config-lgtm.yaml:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 1s
send_batch_size: 1024
memory_limiter:
limit_mib: 400
spike_limit_mib: 100
check_interval: 5s
resource:
attributes:
- key: service.name
value: MyApplication
action: upsert
- key: deployment.environment
value: ${ENVIRONMENT:production}
action: upsert
exporters:
# Loki for logs
loki:
endpoint: http://loki:3100
labels:
resource:
service.name: "service_name"
deployment.environment: "deployment_environment"
attributes:
http.method: "http_method"
http.status_code: "http_status_code"
# Tempo for traces
otlp/tempo:
endpoint: tempo:3200
tls:
insecure: true
# Prometheus Remote Write for Mimir (metrics)
prometheusremotewrite:
endpoint: http://mimir:9009/api/v1/push
external_labels:
cluster: production
environment: ${ENVIRONMENT:production}
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: [otlp/tempo]
metrics:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: [prometheusremotewrite]
logs:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: [loki]
Step 4: Configure Grafana Data Sources¶
Add Loki Data Source¶
- Open http://localhost:3000 in your browser
- Go to Configuration > Data Sources
- Click Add data source
- Select Loki
- Set URL to
http://loki:3100 - Click Save & Test
Add Tempo Data Source¶
- Click Add data source
- Select Tempo
- Set URL to
http://tempo:3200 - Under Trace to logs, select the Loki data source
- Click Save & Test
Add Mimir Data Source¶
- Click Add data source
- Select Prometheus
- Set URL to
http://mimir:9009/prometheus - Click Save & Test
Step 5: Configure Loki¶
Create loki-config.yaml:
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
ruler:
alertmanager_url: http://localhost:9093
# By default, Loki will send anonymous, but uniquely-identifiable usage and configuration
# analytics to Grafana Labs. These statistics are sent to https://stats.grafana.org/
#
# Statistics help us better understand how Loki is used, and they show us performance
# levels for most users. This helps us prioritize features and documentation.
# For more information on what's sent, look at
# https://github.com/grafana/loki/blob/main/pkg/analytics/stats.go
# Refer to the buildReport method to see what goes into a report.
#
# If you would like to disable reporting, uncomment the following lines:
#analytics:
# reporting_enabled: false
Step 6: Configure Tempo¶
Create tempo-config.yaml:
server:
http_listen_port: 3200
distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
ingester:
max_block_duration: 5m
compactor:
compaction:
block_retention: 1h
compacted_block_retention: 1h
storage:
trace:
backend: local
local:
path: /var/tempo/traces
pool:
max_workers: 100
queue_depth: 10000
overrides:
default:
ingestion_burst_size_mb: 16
ingestion_rate_limit_mb: 16
Step 7: Configure Mimir¶
Create mimir-config.yaml:
target: all
server:
http_listen_port: 9009
grpc_listen_port: 9095
distributor:
pool:
health_check_ingesters: true
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: memberlist
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 1h
max_chunk_age: 1h
chunk_target_size: 1048576
chunk_retain_period: 30s
max_transfer_retries: 0
store_gateway:
sharding_enabled: true
compactor:
data_dir: /data/compactor
sharding_enabled: true
storage:
backend: filesystem
filesystem:
dir: /data/mimir
limits:
ingestion_rate: 10000
ingestion_burst_size: 20000
max_global_series_per_user: 100000
max_global_series_per_metric: 20000
Step 8: Verify Setup¶
Check Loki¶
Query logs:
curl -G -s "http://localhost:3100/loki/api/v1/query_range" \
--data-urlencode 'query={service_name="MyApplication"}' \
--data-urlencode 'start=1h' \
--data-urlencode 'end=now'
Check Tempo¶
Check Mimir¶
Query metrics:
Check Grafana¶
- Open http://localhost:3000
- Verify all data sources are connected
- Navigate to Explore to query data
Step 9: Create Grafana Dashboards¶
Logs Dashboard¶
- Go to Dashboards > New Dashboard
- Add panel with Loki query:
{service_name="MyApplication"} - Visualize logs over time
Traces Dashboard¶
- Add panel with Tempo query
- View trace timeline and spans
- Link traces to logs using trace ID
Metrics Dashboard¶
- Add panel with Prometheus query (Mimir):
rate(http_server_request_duration_seconds_count[5m]) - Create graphs for key metrics
- Set up alerts
Correlated Observability¶
Grafana's Explore view allows you to:
- Start from a trace, jump to related logs
- Start from a log, jump to related traces
- View metrics for the same time period
- Correlate issues across all three pillars
Step 10: Generate Test Data¶
Start your application and generate telemetry:
# Make API calls to generate traces, metrics, and logs
for i in {1..20}; do
curl http://localhost:8081/api/health
sleep 0.5
done
Configuration Examples¶
Resource Attributes¶
Add custom resource attributes for better filtering:
processors:
resource:
attributes:
- key: service.name
value: MyApplication
action: upsert
- key: deployment.environment
value: production
action: upsert
- key: service.version
value: 1.0.0
action: upsert
- key: team.name
value: platform-team
action: upsert
Log Labeling¶
Configure Loki labels for efficient querying:
exporters:
loki:
endpoint: http://loki:3100
labels:
resource:
service.name: "service_name"
deployment.environment: "deployment_environment"
service.version: "service_version"
attributes:
http.method: "http_method"
http.status_code: "http_status_code"
http.route: "http_route"
Trace Sampling¶
Reduce trace volume with sampling:
Metrics Aggregation¶
Configure Mimir for long-term storage:
exporters:
prometheusremotewrite:
endpoint: http://mimir:9009/api/v1/push
external_labels:
cluster: production
environment: production
region: us-east-1
Performance Optimization¶
Loki Optimization¶
- Use label-based indexing (not full-text search)
- Limit label cardinality
- Configure retention policies
- Use chunk compression
Tempo Optimization¶
- Enable trace compression
- Configure block retention
- Use object storage for long-term retention
- Enable trace sampling
Mimir Optimization¶
- Configure ingestion limits
- Use sharding for scale
- Enable compression
- Configure retention policies
Cost Considerations¶
The LGTM stack is designed to be cost-effective:
- Efficient Storage: Compressed storage formats
- No Vendor Lock-in: Open-source, self-hosted
- Scalable: Handles high volume efficiently
- Resource Efficient: Lower resource requirements than ELK stack
Storage Optimization¶
- Configure retention policies
- Use compression
- Archive old data to object storage
- Enable downsampling for long-term metrics
Troubleshooting¶
No Logs in Loki¶
Problem: Logs not appearing in Loki
Solution:
- Verify collector is exporting to Loki: Check collector logs
- Check Loki is running:
curl http://localhost:3100/ready - Verify label configuration matches query
- Check time range in Grafana
No Traces in Tempo¶
Problem: Traces not appearing in Tempo
Solution:
- Verify collector is exporting to Tempo
- Check Tempo is running:
curl http://localhost:3200/ready - Verify OTLP endpoint configuration
- Check trace sampling percentage
No Metrics in Mimir¶
Problem: Metrics not appearing in Mimir
Solution:
- Verify Prometheus Remote Write endpoint
- Check Mimir is running:
curl http://localhost:9009/ready - Verify metric format compatibility
- Check ingestion limits
High Resource Usage¶
Problem: Stack using too much memory/CPU
Solution:
- Reduce retention periods
- Enable sampling
- Optimize label cardinality
- Increase resource limits
- Use object storage for long-term data
Use Cases¶
Comprehensive Observability¶
- Single platform for metrics, logs, and traces
- Correlated observability across all pillars
- Unified dashboards and alerts
Cost-Effective Monitoring¶
- Open-source solution
- Efficient storage
- No per-GB pricing
- Self-hosted control
High-Volume Systems¶
- Scalable architecture
- Handles millions of metrics
- Efficient log aggregation
- Distributed trace storage
Multi-Tenant Environments¶
- Tenant isolation
- Resource quotas
- Per-tenant dashboards
- Access control
Related Documentation¶
- Observability Stacks: Observability stack comparison and selection
- OpenTelemetry Collector: OpenTelemetry Collector configuration
- Logging: Logging patterns and best practices
- Metrics: Metrics collection and instrumentation
- Distributed Tracing: Distributed tracing implementation