Self-Hosted Agents - Troubleshooting Guide¶
Overview¶
This guide covers common issues encountered with self-hosted Azure DevOps agents and their solutions.
Agent Not Appearing in Azure DevOps¶
Symptoms¶
- Agent does not appear in agent pool
- Agent shows as "Offline" immediately after installation
Possible Causes¶
- Incorrect PAT token permissions
- Network connectivity issues
- Agent configuration errors
- Service not running
Solutions¶
Check PAT Token Permissions¶
- Verify PAT token has Agent Pools (Read & Manage) scope
- Check token expiration date
- Create new PAT if needed
Verify Network Connectivity¶
Linux:
# Test Azure DevOps connectivity
curl -I https://dev.azure.com
# Test DNS resolution
nslookup dev.azure.com
Windows:
# Test Azure DevOps connectivity
Test-NetConnection -ComputerName dev.azure.com -Port 443
# Test DNS resolution
Resolve-DnsName dev.azure.com
Check Agent Configuration¶
Linux:
# View agent configuration
cat ~/azagent/.agent
# Check service status
sudo systemctl status vsts.agent.*.service
Windows:
# View agent configuration
Get-Content C:\azagent\.agent
# Check service status
Get-Service | Where-Object {$_.Name -like "*vsts*"}
Review Agent Logs¶
Linux:
# View recent logs
sudo journalctl -u vsts.agent.*.service -n 100 --no-pager
# Follow logs in real-time
sudo journalctl -u vsts.agent.*.service -f
Windows:
# View recent event logs
Get-EventLog -LogName Application -Source "vsts*" -Newest 50
# View specific error events
Get-EventLog -LogName Application -Source "vsts*" -EntryType Error -Newest 20
Agent Goes Offline¶
Symptoms¶
- Agent was online but now shows as offline
- Agent status changes to offline intermittently
Possible Causes¶
- Service stopped
- Network connectivity lost
- Server rebooted
- PAT token expired
Solutions¶
Check Service Status¶
Linux:
# Check service status
sudo systemctl status vsts.agent.*.service
# Start service if stopped
sudo systemctl start vsts.agent.*.service
# Enable auto-start
sudo systemctl enable vsts.agent.*.service
Windows:
# Check service status
Get-Service | Where-Object {$_.Name -like "*vsts*"}
# Start service if stopped
Start-Service -Name (Get-Service | Where-Object {$_.Name -like "*vsts*"}).Name
Verify Network Connectivity¶
# Linux
ping -c 4 dev.azure.com
curl -I https://dev.azure.com
# Windows
Test-Connection dev.azure.com
Test-NetConnection -ComputerName dev.azure.com -Port 443
Check for Server Reboots¶
Linux:
Windows:
# Check last reboot time
Get-EventLog -LogName System -Source "Microsoft-Windows-Kernel-General" | Where-Object {$_.EventID -eq 1074} | Select-Object -First 1
# Check system uptime
(Get-CimInstance Win32_OperatingSystem).LastBootUpTime
Docker Not Found Error¶
Symptoms¶
- Pipeline fails with error:
##[error]File not found: 'docker' - Container services fail to start
- Error:
docker: command not found
Possible Causes¶
- Docker not installed on agent
- Docker not in PATH
- Agent user not in docker group
- Docker service not running
Solutions¶
Install Docker (Linux)¶
# Add Docker's official GPG key
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
# Set up repository
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install Docker
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# Start Docker service
sudo systemctl start docker
sudo systemctl enable docker
# Add agent user to docker group
sudo usermod -aG docker azdevops
# Verify installation (may need to log out and back in for group changes)
sudo docker run hello-world
Verify Docker Installation¶
# Check Docker version
docker --version
# Check Docker service status
sudo systemctl status docker
# Test Docker (as azdevops user)
docker run hello-world
# If permission denied, log out and back in, or restart agent service
Fix Docker Permissions¶
# Add user to docker group (if not already)
sudo usermod -aG docker azdevops
# Verify user is in docker group
groups azdevops
# Restart agent service to apply group changes
cd ~/azagent
sudo ./svc.sh stop
sudo ./svc.sh start
Verify Docker is Accessible to Agent¶
# Check if agent user can run Docker
sudo -u azdevops docker ps
# If permission denied, ensure:
# 1. User is in docker group: groups azdevops
# 2. Docker socket has correct permissions: ls -la /var/run/docker.sock
# 3. Agent service is restarted after group changes
Note: If your pipelines use container services (redis, mssql, mongodb, etc.), Docker is required, not optional. See the Linux Setup Guide for complete Docker installation instructions.
Build Failures on Agent¶
Symptoms¶
- Builds fail with tool not found errors
- Builds fail with permission errors
- Builds fail with disk space errors
Possible Causes¶
- Required tools not installed
- Insufficient permissions
- Disk space full
- Incorrect agent capabilities
Solutions¶
Verify Required Tools¶
Linux:
# Check .NET SDK
dotnet --version
# Check Docker
docker --version
# If Docker is not found, install it (see Linux Setup Guide)
# Check Node.js
node --version
npm --version
Windows:
# Check .NET SDK
dotnet --version
# Check Git
git --version
# Check Node.js
node --version
npm --version
Check Permissions¶
Linux:
Windows:
# Check agent user permissions
whoami /groups
# Check directory permissions
Get-Acl C:\azagent\_work
Check Disk Space¶
Linux:
Windows:
# Check disk usage
Get-PSDrive C | Select-Object Used, Free
# Check specific directory
Get-ChildItem C:\azagent\_work -Recurse | Measure-Object -Property Length -Sum
Verify Agent Capabilities¶
- In Azure DevOps, navigate to agent pool
- Select agent → Capabilities tab
- Verify required capabilities are present
- Add missing capabilities if needed
Code Coverage Not Found by Build Quality Checks¶
Symptoms¶
- Build Quality Checks shows 0% coverage:
Total lines: 0, Covered lines: 0 - Coverage reports are published successfully but Build Quality Checks can't find them
- Error:
The code coverage value (0%, 0 lines) is lower than the minimum value
Possible Causes¶
- Case sensitivity on Linux - File paths are case-sensitive on Linux
- Coverage XML files not found by PublishCodeCoverageResults - The glob pattern might not match on Linux
- Coverage files in wrong location - Files might be in a different directory than expected
Solutions¶
Verify Coverage Files Exist¶
Add a diagnostic step before Build Quality Checks to verify coverage files:
- script: |
echo "Checking for coverage files..."
find "$(Agent.TempDirectory)" -name "coverage.cobertura.xml" -type f
find "$(Agent.TempDirectory)" -name "*coverage*" -type f
displayName: 'Diagnose coverage file locations'
Ensure PublishCodeCoverageResults Finds Files¶
The PublishCodeCoverageResults@2 task uses:
On Linux, ensure:
1. The file name is exactly coverage.cobertura.xml (case-sensitive)
2. The file is in a subdirectory of $(Agent.TempDirectory)
3. The file is readable by the agent user
Fix Coverage File Paths¶
If coverage files are in a different location, you may need to:
-
Copy files to expected location:
-
Update PublishCodeCoverageResults path (if you can modify the template):
Verify Build Quality Checks Configuration¶
Ensure Build Quality Checks is configured correctly:
- task: mspremier.BuildQualityChecks.QualityChecks-task.BuildQualityChecks@10
inputs:
checkCoverage: true
coverageFailOption: fixed
coverageType: lines
coverageThreshold: '76'
Note: Build Quality Checks reads coverage data from PublishCodeCoverageResults, not from file artifacts. The coverage must be published successfully before Build Quality Checks can read it.
Pipeline Cannot Find Agent¶
Symptoms¶
- Pipeline shows "No agent found" error
- Pipeline waits indefinitely for agent
Possible Causes¶
- Pool name mismatch
- Demand requirements not met
- All agents busy or offline
- Agent capabilities don't match demands
Solutions¶
Verify Pool Name¶
Ensure pool name in pipeline YAML matches exactly (case-sensitive):
Check Agent Demands¶
Verify agent capabilities match pipeline demands:
pool:
name: 'Hetzner-Linux'
demands:
- Agent.OS -equals Linux
- DotNet -equals 9.0.x # Agent must have this capability
Verify Agent Availability¶
- Check agent pool in Azure DevOps
- Verify at least one agent is online
- Check if agents are busy with other jobs
- Consider adding more agents if all are busy
High Disk Usage¶
Symptoms¶
- Builds fail with "No space left on device" errors
- Disk usage shows > 90%
Solutions¶
Clean Up Build Artifacts¶
Linux:
# Clean agent work directory
cd ~/azagent/_work
rm -rf *
# Clean old directories (older than 30 days)
find ~/azagent/_work -type d -mtime +30 -exec rm -rf {} \;
Windows:
# Clean agent work directory
Remove-Item C:\azagent\_work\* -Recurse -Force
# Clean old directories
Get-ChildItem C:\azagent\_work -Directory | Where-Object {$_.LastWriteTime -lt (Get-Date).AddDays(-30)} | Remove-Item -Recurse -Force
Clean Package Caches¶
Linux:
# Clean NuGet cache
rm -rf ~/.nuget/packages/*
# Clean npm cache
npm cache clean --force
# Clean Docker
docker system prune -a --volumes
Windows:
# Clean NuGet cache
Remove-Item "$env:USERPROFILE\.nuget\packages\*" -Recurse -Force
# Clean npm cache
npm cache clean --force
# Clean Docker
docker system prune -a --volumes
Increase Disk Size¶
If using Hetzner Cloud, you can increase disk size:
- Navigate to Hetzner Cloud Console
- Select server → Resize → Increase disk size
- Follow instructions to resize filesystem
Slow Build Performance¶
Symptoms¶
- Builds take longer than expected
- Agent CPU/memory usage is high
Solutions¶
Check System Resources¶
Linux:
Windows:
# Check CPU and memory
Get-Process | Sort-Object CPU -Descending | Select-Object -First 10
Get-Counter '\Processor(_Total)\% Processor Time'
Get-Counter '\Memory\Available MBytes'
# Check disk I/O
Get-Counter '\PhysicalDisk(*)\Disk Reads/sec'
Get-Counter '\PhysicalDisk(*)\Disk Writes/sec'
Optimize Build Cache¶
- Configure persistent NuGet cache
- Use Docker layer caching
- Cache npm/node_modules
- Cache build artifacts between runs
Upgrade Server Resources¶
If resources are consistently maxed out:
- Consider upgrading to larger server type
- Add more agents to distribute load
- Optimize build processes
Authentication Errors¶
Symptoms¶
- "401 Unauthorized" errors
- "403 Forbidden" errors
- PAT token errors
Solutions¶
Verify PAT Token¶
- Check token expiration date
- Verify token has correct scopes:
- Agent Pools (Read & Manage)
- Build (Read & Execute)
- Create new PAT if needed
Update Agent Configuration¶
Linux:
Windows:
Service Won't Start¶
Symptoms¶
- Agent service fails to start
- Service shows as "Failed" status
Solutions¶
Check Service Logs¶
Linux:
# View service logs
sudo journalctl -u vsts.agent.*.service -n 100 --no-pager
# Check service status
sudo systemctl status vsts.agent.*.service
Windows:
# View service logs
Get-EventLog -LogName Application -Source "vsts*" -Newest 50
# Check service status
Get-Service | Where-Object {$_.Name -like "*vsts*"}
Verify Agent Configuration¶
Linux:
# Check configuration file
cat ~/azagent/.agent
# Verify credentials file exists
ls -la ~/azagent/.credentials
Windows:
# Check configuration file
Get-Content C:\azagent\.agent
# Verify credentials file exists
Test-Path C:\azagent\.credentials
Reinstall Service¶
Linux:
Windows:
Network Connectivity Issues¶
Symptoms¶
- Agent cannot connect to Azure DevOps
- Timeout errors
- SSL/TLS errors
Solutions¶
Test Connectivity¶
# Linux
curl -v https://dev.azure.com
ping -c 4 dev.azure.com
# Windows
Test-NetConnection -ComputerName dev.azure.com -Port 443
Test-Connection dev.azure.com
Check Firewall Rules¶
Linux:
Windows:
# Check firewall rules
Get-NetFirewallRule | Where-Object {$_.DisplayName -like "*HTTPS*"}
# Allow outbound HTTPS (usually enabled by default)
Check Proxy Settings¶
If behind a proxy:
- Configure proxy in agent environment
- Set HTTP_PROXY and HTTPS_PROXY variables
- Update agent configuration if needed
Git Authentication Errors on Self-Hosted Agents¶
Symptoms¶
- Error:
fatal: unable to access 'https://dev.azure.com/...': The requested URL returned error: 400 - Git fetch fails with exit code 128
- Repository checkout fails on self-hosted Linux agents
- Works on Microsoft-hosted agents but fails on self-hosted
- Error occurs after manually installing Git on the agent
Possible Causes¶
- Missing explicit checkout with credentials - Default checkout doesn't persist credentials on self-hosted agents
- Git configuration conflicts - Manual Git installation may have changed global Git config
- Agent permissions - Agent user doesn't have proper repository access
- Stale Git credentials - Old credentials cached in Git config
Solutions¶
Add Explicit Checkout with Credentials (Required)¶
In your pipeline YAML, add explicit checkout step:
steps:
- checkout: self
persistCredentials: true
displayName: 'Checkout repository with credentials'
# ... rest of your steps
This is required for self-hosted agents to authenticate properly. Without this, the agent cannot authenticate to fetch from Azure DevOps repositories.
Clear Git Configuration on Agent¶
If Git was manually installed and causing issues:
# Connect to agent server
ssh azdevops@<server-ip>
# Check current Git config
git config --global --list
# Remove problematic credentials
git config --global --unset-all http.extraheader
git config --global --unset-all http.https://dev.azure.com.extraheader
# Verify Git version
git --version
Verify Agent Repository Permissions¶
- In Azure DevOps, go to Project Settings → Repositories
- Select your repository
- Go to Security tab
- Ensure Project Collection Build Service has Read permission
- Ensure Project Build Service has Read permission
Configure Git Authentication Manually (If Needed)¶
If persistCredentials: true doesn't work, configure Git manually in pipeline:
- script: |
git config --global http.extraheader "AUTHORIZATION: bearer $(System.AccessToken)"
git config --global http.version HTTP/1.1
displayName: 'Configure Git authentication'
env:
System_AccessToken: $(System.AccessToken)
Restart Agent Service¶
After making changes, restart the agent:
Note: The persistCredentials: true option is the standard solution for self-hosted agents. Always include this in your pipeline YAML when using self-hosted agents.
Getting Additional Help¶
Azure DevOps Resources¶
Hetzner Cloud Resources¶
Log Collection¶
When seeking help, collect:
- Agent logs (last 100 lines)
- Service status
- System resource usage
- Network connectivity test results
- Agent configuration (sanitized)
Next Steps¶
- Review Maintenance Guide for preventive measures
- Set up monitoring to catch issues early
- Document your specific troubleshooting procedures