Skip to content

Self-Hosted Agents - Troubleshooting Guide

Overview

This guide covers common issues encountered with self-hosted Azure DevOps agents and their solutions.

Agent Not Appearing in Azure DevOps

Symptoms

  • Agent does not appear in agent pool
  • Agent shows as "Offline" immediately after installation

Possible Causes

  1. Incorrect PAT token permissions
  2. Network connectivity issues
  3. Agent configuration errors
  4. Service not running

Solutions

Check PAT Token Permissions

  1. Verify PAT token has Agent Pools (Read & Manage) scope
  2. Check token expiration date
  3. Create new PAT if needed

Verify Network Connectivity

Linux:

# Test Azure DevOps connectivity
curl -I https://dev.azure.com

# Test DNS resolution
nslookup dev.azure.com

Windows:

# Test Azure DevOps connectivity
Test-NetConnection -ComputerName dev.azure.com -Port 443

# Test DNS resolution
Resolve-DnsName dev.azure.com

Check Agent Configuration

Linux:

# View agent configuration
cat ~/azagent/.agent

# Check service status
sudo systemctl status vsts.agent.*.service

Windows:

# View agent configuration
Get-Content C:\azagent\.agent

# Check service status
Get-Service | Where-Object {$_.Name -like "*vsts*"}

Review Agent Logs

Linux:

# View recent logs
sudo journalctl -u vsts.agent.*.service -n 100 --no-pager

# Follow logs in real-time
sudo journalctl -u vsts.agent.*.service -f

Windows:

# View recent event logs
Get-EventLog -LogName Application -Source "vsts*" -Newest 50

# View specific error events
Get-EventLog -LogName Application -Source "vsts*" -EntryType Error -Newest 20

Agent Goes Offline

Symptoms

  • Agent was online but now shows as offline
  • Agent status changes to offline intermittently

Possible Causes

  1. Service stopped
  2. Network connectivity lost
  3. Server rebooted
  4. PAT token expired

Solutions

Check Service Status

Linux:

# Check service status
sudo systemctl status vsts.agent.*.service

# Start service if stopped
sudo systemctl start vsts.agent.*.service

# Enable auto-start
sudo systemctl enable vsts.agent.*.service

Windows:

# Check service status
Get-Service | Where-Object {$_.Name -like "*vsts*"}

# Start service if stopped
Start-Service -Name (Get-Service | Where-Object {$_.Name -like "*vsts*"}).Name

Verify Network Connectivity

# Linux
ping -c 4 dev.azure.com
curl -I https://dev.azure.com

# Windows
Test-Connection dev.azure.com
Test-NetConnection -ComputerName dev.azure.com -Port 443

Check for Server Reboots

Linux:

# Check last reboot time
last reboot

# Check system uptime
uptime

Windows:

# Check last reboot time
Get-EventLog -LogName System -Source "Microsoft-Windows-Kernel-General" | Where-Object {$_.EventID -eq 1074} | Select-Object -First 1

# Check system uptime
(Get-CimInstance Win32_OperatingSystem).LastBootUpTime

Docker Not Found Error

Symptoms

  • Pipeline fails with error: ##[error]File not found: 'docker'
  • Container services fail to start
  • Error: docker: command not found

Possible Causes

  1. Docker not installed on agent
  2. Docker not in PATH
  3. Agent user not in docker group
  4. Docker service not running

Solutions

Install Docker (Linux)

# Add Docker's official GPG key
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

# Set up repository
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# Start Docker service
sudo systemctl start docker
sudo systemctl enable docker

# Add agent user to docker group
sudo usermod -aG docker azdevops

# Verify installation (may need to log out and back in for group changes)
sudo docker run hello-world

Verify Docker Installation

# Check Docker version
docker --version

# Check Docker service status
sudo systemctl status docker

# Test Docker (as azdevops user)
docker run hello-world

# If permission denied, log out and back in, or restart agent service

Fix Docker Permissions

# Add user to docker group (if not already)
sudo usermod -aG docker azdevops

# Verify user is in docker group
groups azdevops

# Restart agent service to apply group changes
cd ~/azagent
sudo ./svc.sh stop
sudo ./svc.sh start

Verify Docker is Accessible to Agent

# Check if agent user can run Docker
sudo -u azdevops docker ps

# If permission denied, ensure:
# 1. User is in docker group: groups azdevops
# 2. Docker socket has correct permissions: ls -la /var/run/docker.sock
# 3. Agent service is restarted after group changes

Note: If your pipelines use container services (redis, mssql, mongodb, etc.), Docker is required, not optional. See the Linux Setup Guide for complete Docker installation instructions.

Build Failures on Agent

Symptoms

  • Builds fail with tool not found errors
  • Builds fail with permission errors
  • Builds fail with disk space errors

Possible Causes

  1. Required tools not installed
  2. Insufficient permissions
  3. Disk space full
  4. Incorrect agent capabilities

Solutions

Verify Required Tools

Linux:

# Check .NET SDK
dotnet --version

# Check Docker
docker --version

# If Docker is not found, install it (see Linux Setup Guide)

# Check Node.js
node --version
npm --version

Windows:

# Check .NET SDK
dotnet --version

# Check Git
git --version

# Check Node.js
node --version
npm --version

Check Permissions

Linux:

# Check agent user permissions
id azdevops

# Check directory permissions
ls -la ~/azagent/_work

Windows:

# Check agent user permissions
whoami /groups

# Check directory permissions
Get-Acl C:\azagent\_work

Check Disk Space

Linux:

# Check disk usage
df -h

# Check specific directory
du -sh ~/azagent/_work/*

Windows:

# Check disk usage
Get-PSDrive C | Select-Object Used, Free

# Check specific directory
Get-ChildItem C:\azagent\_work -Recurse | Measure-Object -Property Length -Sum

Verify Agent Capabilities

  1. In Azure DevOps, navigate to agent pool
  2. Select agent → Capabilities tab
  3. Verify required capabilities are present
  4. Add missing capabilities if needed

Code Coverage Not Found by Build Quality Checks

Symptoms

  • Build Quality Checks shows 0% coverage: Total lines: 0, Covered lines: 0
  • Coverage reports are published successfully but Build Quality Checks can't find them
  • Error: The code coverage value (0%, 0 lines) is lower than the minimum value

Possible Causes

  1. Case sensitivity on Linux - File paths are case-sensitive on Linux
  2. Coverage XML files not found by PublishCodeCoverageResults - The glob pattern might not match on Linux
  3. Coverage files in wrong location - Files might be in a different directory than expected

Solutions

Verify Coverage Files Exist

Add a diagnostic step before Build Quality Checks to verify coverage files:

- script: |
    echo "Checking for coverage files..."
    find "$(Agent.TempDirectory)" -name "coverage.cobertura.xml" -type f
    find "$(Agent.TempDirectory)" -name "*coverage*" -type f
  displayName: 'Diagnose coverage file locations'

Ensure PublishCodeCoverageResults Finds Files

The PublishCodeCoverageResults@2 task uses:

summaryFileLocation: '$(Agent.TempDirectory)/**/coverage.cobertura.xml'

On Linux, ensure: 1. The file name is exactly coverage.cobertura.xml (case-sensitive) 2. The file is in a subdirectory of $(Agent.TempDirectory) 3. The file is readable by the agent user

Fix Coverage File Paths

If coverage files are in a different location, you may need to:

  1. Copy files to expected location:

    # Find coverage files
    find . -name "coverage.cobertura.xml" -type f
    
    # Copy to expected location if needed
    mkdir -p "$(Agent.TempDirectory)/TestResults"
    cp path/to/coverage.cobertura.xml "$(Agent.TempDirectory)/TestResults/"
    

  2. Update PublishCodeCoverageResults path (if you can modify the template):

    - task: PublishCodeCoverageResults@2
      inputs:
        codeCoverageTool: 'Cobertura'
        summaryFileLocation: '$(Agent.TempDirectory)/TestResults/**/coverage.cobertura.xml'
        # Or use absolute path if known
    

Verify Build Quality Checks Configuration

Ensure Build Quality Checks is configured correctly:

- task: mspremier.BuildQualityChecks.QualityChecks-task.BuildQualityChecks@10
  inputs:
    checkCoverage: true
    coverageFailOption: fixed
    coverageType: lines
    coverageThreshold: '76'

Note: Build Quality Checks reads coverage data from PublishCodeCoverageResults, not from file artifacts. The coverage must be published successfully before Build Quality Checks can read it.

Pipeline Cannot Find Agent

Symptoms

  • Pipeline shows "No agent found" error
  • Pipeline waits indefinitely for agent

Possible Causes

  1. Pool name mismatch
  2. Demand requirements not met
  3. All agents busy or offline
  4. Agent capabilities don't match demands

Solutions

Verify Pool Name

Ensure pool name in pipeline YAML matches exactly (case-sensitive):

pool:
  name: 'Hetzner-Linux'  # Must match exactly

Check Agent Demands

Verify agent capabilities match pipeline demands:

pool:
  name: 'Hetzner-Linux'
  demands:
    - Agent.OS -equals Linux
    - DotNet -equals 9.0.x  # Agent must have this capability

Verify Agent Availability

  1. Check agent pool in Azure DevOps
  2. Verify at least one agent is online
  3. Check if agents are busy with other jobs
  4. Consider adding more agents if all are busy

High Disk Usage

Symptoms

  • Builds fail with "No space left on device" errors
  • Disk usage shows > 90%

Solutions

Clean Up Build Artifacts

Linux:

# Clean agent work directory
cd ~/azagent/_work
rm -rf *

# Clean old directories (older than 30 days)
find ~/azagent/_work -type d -mtime +30 -exec rm -rf {} \;

Windows:

# Clean agent work directory
Remove-Item C:\azagent\_work\* -Recurse -Force

# Clean old directories
Get-ChildItem C:\azagent\_work -Directory | Where-Object {$_.LastWriteTime -lt (Get-Date).AddDays(-30)} | Remove-Item -Recurse -Force

Clean Package Caches

Linux:

# Clean NuGet cache
rm -rf ~/.nuget/packages/*

# Clean npm cache
npm cache clean --force

# Clean Docker
docker system prune -a --volumes

Windows:

# Clean NuGet cache
Remove-Item "$env:USERPROFILE\.nuget\packages\*" -Recurse -Force

# Clean npm cache
npm cache clean --force

# Clean Docker
docker system prune -a --volumes

Increase Disk Size

If using Hetzner Cloud, you can increase disk size:

  1. Navigate to Hetzner Cloud Console
  2. Select server → ResizeIncrease disk size
  3. Follow instructions to resize filesystem

Slow Build Performance

Symptoms

  • Builds take longer than expected
  • Agent CPU/memory usage is high

Solutions

Check System Resources

Linux:

# Check CPU and memory
top
htop

# Check disk I/O
iostat -x 1

# Check system load
uptime

Windows:

# Check CPU and memory
Get-Process | Sort-Object CPU -Descending | Select-Object -First 10
Get-Counter '\Processor(_Total)\% Processor Time'
Get-Counter '\Memory\Available MBytes'

# Check disk I/O
Get-Counter '\PhysicalDisk(*)\Disk Reads/sec'
Get-Counter '\PhysicalDisk(*)\Disk Writes/sec'

Optimize Build Cache

  • Configure persistent NuGet cache
  • Use Docker layer caching
  • Cache npm/node_modules
  • Cache build artifacts between runs

Upgrade Server Resources

If resources are consistently maxed out:

  1. Consider upgrading to larger server type
  2. Add more agents to distribute load
  3. Optimize build processes

Authentication Errors

Symptoms

  • "401 Unauthorized" errors
  • "403 Forbidden" errors
  • PAT token errors

Solutions

Verify PAT Token

  1. Check token expiration date
  2. Verify token has correct scopes:
  3. Agent Pools (Read & Manage)
  4. Build (Read & Execute)
  5. Create new PAT if needed

Update Agent Configuration

Linux:

cd ~/azagent
sudo ./svc.sh stop
./config.sh --token <NEW_PAT_TOKEN> --replace
sudo ./svc.sh start

Windows:

cd C:\azagent
.\svc.cmd stop
.\config.cmd --token <NEW_PAT_TOKEN> --replace
.\svc.cmd start

Service Won't Start

Symptoms

  • Agent service fails to start
  • Service shows as "Failed" status

Solutions

Check Service Logs

Linux:

# View service logs
sudo journalctl -u vsts.agent.*.service -n 100 --no-pager

# Check service status
sudo systemctl status vsts.agent.*.service

Windows:

# View service logs
Get-EventLog -LogName Application -Source "vsts*" -Newest 50

# Check service status
Get-Service | Where-Object {$_.Name -like "*vsts*"}

Verify Agent Configuration

Linux:

# Check configuration file
cat ~/azagent/.agent

# Verify credentials file exists
ls -la ~/azagent/.credentials

Windows:

# Check configuration file
Get-Content C:\azagent\.agent

# Verify credentials file exists
Test-Path C:\azagent\.credentials

Reinstall Service

Linux:

cd ~/azagent
sudo ./svc.sh uninstall
sudo ./svc.sh install azdevops
sudo ./svc.sh start

Windows:

cd C:\azagent
.\svc.cmd uninstall
.\svc.cmd install
.\svc.cmd start

Network Connectivity Issues

Symptoms

  • Agent cannot connect to Azure DevOps
  • Timeout errors
  • SSL/TLS errors

Solutions

Test Connectivity

# Linux
curl -v https://dev.azure.com
ping -c 4 dev.azure.com

# Windows
Test-NetConnection -ComputerName dev.azure.com -Port 443
Test-Connection dev.azure.com

Check Firewall Rules

Linux:

# Check firewall status
sudo ufw status

# Allow outbound HTTPS
sudo ufw allow out 443/tcp

Windows:

# Check firewall rules
Get-NetFirewallRule | Where-Object {$_.DisplayName -like "*HTTPS*"}

# Allow outbound HTTPS (usually enabled by default)

Check Proxy Settings

If behind a proxy:

  1. Configure proxy in agent environment
  2. Set HTTP_PROXY and HTTPS_PROXY variables
  3. Update agent configuration if needed

Git Authentication Errors on Self-Hosted Agents

Symptoms

  • Error: fatal: unable to access 'https://dev.azure.com/...': The requested URL returned error: 400
  • Git fetch fails with exit code 128
  • Repository checkout fails on self-hosted Linux agents
  • Works on Microsoft-hosted agents but fails on self-hosted
  • Error occurs after manually installing Git on the agent

Possible Causes

  1. Missing explicit checkout with credentials - Default checkout doesn't persist credentials on self-hosted agents
  2. Git configuration conflicts - Manual Git installation may have changed global Git config
  3. Agent permissions - Agent user doesn't have proper repository access
  4. Stale Git credentials - Old credentials cached in Git config

Solutions

Add Explicit Checkout with Credentials (Required)

In your pipeline YAML, add explicit checkout step:

steps:
  - checkout: self
    persistCredentials: true
    displayName: 'Checkout repository with credentials'
  # ... rest of your steps

This is required for self-hosted agents to authenticate properly. Without this, the agent cannot authenticate to fetch from Azure DevOps repositories.

Clear Git Configuration on Agent

If Git was manually installed and causing issues:

# Connect to agent server
ssh azdevops@<server-ip>

# Check current Git config
git config --global --list

# Remove problematic credentials
git config --global --unset-all http.extraheader
git config --global --unset-all http.https://dev.azure.com.extraheader

# Verify Git version
git --version

Verify Agent Repository Permissions

  1. In Azure DevOps, go to Project SettingsRepositories
  2. Select your repository
  3. Go to Security tab
  4. Ensure Project Collection Build Service has Read permission
  5. Ensure Project Build Service has Read permission

Configure Git Authentication Manually (If Needed)

If persistCredentials: true doesn't work, configure Git manually in pipeline:

- script: |
    git config --global http.extraheader "AUTHORIZATION: bearer $(System.AccessToken)"
    git config --global http.version HTTP/1.1
  displayName: 'Configure Git authentication'
  env:
    System_AccessToken: $(System.AccessToken)

Restart Agent Service

After making changes, restart the agent:

cd ~/azagent
sudo ./svc.sh stop
sudo ./svc.sh start

Note: The persistCredentials: true option is the standard solution for self-hosted agents. Always include this in your pipeline YAML when using self-hosted agents.

Getting Additional Help

Azure DevOps Resources

Hetzner Cloud Resources

Log Collection

When seeking help, collect:

  1. Agent logs (last 100 lines)
  2. Service status
  3. System resource usage
  4. Network connectivity test results
  5. Agent configuration (sanitized)

Next Steps

  • Review Maintenance Guide for preventive measures
  • Set up monitoring to catch issues early
  • Document your specific troubleshooting procedures