Rate Limiting in ConnectSoft Microservice Template¶
Purpose & Overview¶
Rate Limiting is a mechanism that controls the number of requests a client can make to an API within a specified time window. In the ConnectSoft Microservice Template, rate limiting is implemented using ASP.NET Core's built-in rate limiting middleware, protecting the microservice from abuse, ensuring fair resource allocation, and maintaining system stability under load.
Rate limiting provides:
- Protection Against Abuse: Prevents malicious or misconfigured clients from overwhelming the service
- Fair Resource Allocation: Ensures resources are distributed fairly among clients
- System Stability: Protects backend services from traffic spikes
- Cost Control: Limits API usage to prevent excessive resource consumption
- DDoS Mitigation: First line of defense against denial-of-service attacks
- Compliance: Enables enforcement of usage quotas and service-level agreements
Rate Limiting Philosophy
Rate limiting is a critical security and performance feature that should be configured thoughtfully. The template provides a global rate limiter with configurable limits, while allowing specific endpoints (like health checks) to bypass rate limiting when necessary. Rate limits should be tested under load and adjusted based on actual traffic patterns and system capacity.
Architecture Overview¶
Rate Limiting in the Request Pipeline¶
Incoming Request
↓
Rate Limiting Middleware (UseRateLimiter)
├── Extract Partition Key (IP, User ID, etc.)
├── Check Rate Limit Policy
├── Acquire Permit
│ ├── Success → Continue to endpoint
│ └── Failure → Return 429 Too Many Requests
↓
Routing Middleware
↓
Controller/Endpoint
↓
Response
Rate Limiting Components¶
RateLimitingExtensions.cs
├── AddMicroserviceRateLimiting() - Service Registration
│ ├── GlobalLimiter (Fixed Window)
│ │ ├── Partition Key Strategy (IP address or Test ID)
│ │ ├── PermitLimit
│ │ ├── Window
│ │ ├── AutoReplenishment
│ │ └── QueueLimit
│ └── RejectionStatusCode (429)
└── UseMicroserviceRateLimiter() - Middleware
└── Place after UseRouting()
RateLimitingOptions.cs
├── EnableRateLimiting (bool)
└── GlobalLimiter (GlobalLimiterOptions)
├── Window (TimeSpan)
├── AutoReplenishment (bool)
├── PermitLimit (int)
└── QueueLimit (int)
Service Registration¶
AddMicroserviceRateLimiting Extension¶
Rate limiting is registered via AddMicroserviceRateLimiting():
Implementation:
// RateLimitingExtensions.cs
internal static IServiceCollection AddMicroserviceRateLimiting(this IServiceCollection services)
{
ArgumentNullException.ThrowIfNull(services);
if (OptionsExtensions.RateLimitingOptions.EnableRateLimiting)
{
// Add rate limiting services
services.AddRateLimiter(options =>
{
// Set custom status code for rejections
options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
// Configure global limiter with partitioning strategy
options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(context =>
{
// Prefer test-provided key, fallback to IP for normal traffic
var key = context.Request.Headers.TryGetValue("X-Test-Id", out var v)
? v.ToString()
: context.GetClientIp() ?? "unknown";
return RateLimitPartition.GetFixedWindowLimiter(key, _ => new FixedWindowRateLimiterOptions
{
PermitLimit = OptionsExtensions.RateLimitingOptions.GlobalLimiter.PermitLimit,
Window = OptionsExtensions.RateLimitingOptions.GlobalLimiter.Window,
AutoReplenishment = OptionsExtensions.RateLimitingOptions.GlobalLimiter.AutoReplenishment,
QueueLimit = OptionsExtensions.RateLimitingOptions.GlobalLimiter.QueueLimit,
});
});
// Add here additional rate limiting policies and strategies
// See: https://learn.microsoft.com/en-us/aspnet/core/performance/rate-limit?view=aspnetcore-9.0
});
}
return services;
}
UseMicroserviceRateLimiter Middleware¶
Pipeline Position:
Placement:
// Middleware order:
application.UseRouting(); // Before rate limiting
application.UseMicroserviceRateLimiter(); // After routing
application.UseEndpoints(...); // After rate limiting
Important: Rate limiting middleware must be placed:
- After UseRouting() (to access route information)
- Before UseEndpoints() (to intercept requests before endpoint execution)
Rate Limiting Algorithms¶
Fixed Window Rate Limiter¶
The template uses Fixed Window rate limiting by default:
How It Works: - Divides time into fixed windows (e.g., 1 minute) - Allows a fixed number of requests per window (e.g., 100 requests) - Resets the counter at the start of each new window - Simple and predictable behavior
Example:
Window: 1 minute
Permit Limit: 5 requests
Time: 00:00:00 - 00:01:00 → 5 requests allowed
Time: 00:01:00 - 00:02:00 → Counter resets, 5 requests allowed again
Advantages: - Simple to understand and implement - Predictable reset behavior - Low memory overhead - Easy to configure
Disadvantages: - Can allow bursts at window boundaries - May not provide smooth rate limiting - Less precise than sliding window
Other Rate Limiting Algorithms (Available in ASP.NET Core)¶
Sliding Window: - Rolling window of time - Smoother rate limiting - More memory intensive
Token Bucket: - Allows bursts up to bucket size - Refills tokens at fixed rate - Good for bursty traffic patterns
Concurrency Limiter: - Limits concurrent requests - Not time-based - Useful for resource protection
Configuration¶
RateLimitingOptions¶
Configuration Class:
// RateLimitingOptions.cs
public sealed class RateLimitingOptions
{
public const string RateLimitingOptionsSectionName = "RateLimiting";
[Required]
required public bool EnableRateLimiting { get; set; }
[Required]
[ValidateObjectMembers]
required public GlobalLimiterOptions GlobalLimiter { get; set; }
#if UseMCP
/// <summary>
/// Gets or sets MCP endpoint rate limiter settings (fixed window rate limiter).
/// Required when MCP is enabled. Can be removed at template generation time if MCP rate limiting is not needed.
/// </summary>
[Required]
[ValidateObjectMembers]
required public GlobalLimiterOptions McpLimiter { get; set; }
#endif
}
GlobalLimiterOptions¶
Configuration Class:
// GlobalLimiterOptions.cs
public sealed class GlobalLimiterOptions
{
/// <summary>
/// Time window that takes in the requests.
/// Must be greater than TimeSpan.Zero.
/// </summary>
[Required]
[DataType(DataType.Duration)]
required public TimeSpan Window { get; set; } = TimeSpan.FromSeconds(1);
/// <summary>
/// Whether the fixed window rate limiter automatically refreshes counters
/// or if someone else will be calling externally to refresh counters.
/// </summary>
[Required]
required public bool AutoReplenishment { get; set; } = true;
/// <summary>
/// Maximum number of permit counters that can be allowed in a window.
/// Must be greater than 0.
/// </summary>
[Required]
required public int PermitLimit { get; set; }
/// <summary>
/// Maximum cumulative permit count of queued acquisition requests.
/// Must be greater than or equal to 0.
/// </summary>
[Required]
required public int QueueLimit { get; set; }
}
appsettings.json Configuration¶
Example Configuration:
{
"RateLimiting": {
"EnableRateLimiting": true,
"GlobalLimiter": {
"Window": "00:01:00",
"AutoReplenishment": true,
"PermitLimit": 100,
"QueueLimit": 0
},
"McpLimiter": {
"Window": "00:01:00",
"AutoReplenishment": true,
"PermitLimit": 100,
"QueueLimit": 0
}
}
}
Note: The McpLimiter section is optional and only used when MCP is enabled. It allows you to configure separate rate limits for MCP endpoints.
Configuration Parameters:
| Parameter | Type | Description | Default | Example |
|---|---|---|---|---|
EnableRateLimiting |
bool |
Enable or disable rate limiting | false |
true |
GlobalLimiter |
GlobalLimiterOptions |
Global rate limiter settings (required) | Required | See below |
McpLimiter |
GlobalLimiterOptions |
MCP endpoint rate limiter settings (required when MCP is enabled) | Required when MCP is enabled | See below |
GlobalLimiter and McpLimiter Options:
| Parameter | Type | Description | Default | Example |
|---|---|---|---|---|
Window |
TimeSpan |
Time window for rate limiting | 00:00:01 |
00:01:00 (1 minute) |
AutoReplenishment |
bool |
Automatically refresh counters | true |
true |
PermitLimit |
int |
Maximum requests per window | Required | 100 |
QueueLimit |
int |
Maximum queued requests | 0 |
0 (no queuing) |
Environment-Specific Configuration¶
Development:
{
"RateLimiting": {
"EnableRateLimiting": true,
"GlobalLimiter": {
"Window": "00:01:00",
"AutoReplenishment": true,
"PermitLimit": 100,
"QueueLimit": 0
}
}
}
Production:
{
"RateLimiting": {
"EnableRateLimiting": true,
"GlobalLimiter": {
"Window": "00:01:00",
"AutoReplenishment": true,
"PermitLimit": 1000,
"QueueLimit": 0
}
}
}
Testing:
{
"RateLimiting": {
"EnableRateLimiting": true,
"GlobalLimiter": {
"Window": "00:01:00",
"AutoReplenishment": true,
"PermitLimit": 5,
"QueueLimit": 0
}
}
}
Partitioning Strategy¶
Partition Key Selection¶
The rate limiter uses a partitioning strategy to group requests:
Current Implementation:
var key = context.Request.Headers.TryGetValue("X-Test-Id", out var v)
? v.ToString()
: context.GetClientIp() ?? "unknown";
Partition Key Priority:
1. X-Test-Id Header: Used for testing (if present)
2. Client IP Address: Extracted from X-Forwarded-For or RemoteIpAddress
3. "unknown": Fallback if IP cannot be determined
Client IP Extraction:
private static string GetClientIp(this HttpContext httpContext)
{
string forwardedFor = httpContext.Request.Headers["X-Forwarded-For"].FirstOrDefault();
if (!string.IsNullOrEmpty(forwardedFor))
{
return forwardedFor.Split(',')[0]; // Take the first IP in the list
}
return httpContext.Connection.RemoteIpAddress?.ToString();
}
Alternative Partitioning Strategies¶
By User Identity:
options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(context =>
{
var key = context.User.Identity?.Name ?? context.GetClientIp() ?? "unknown";
return RateLimitPartition.GetFixedWindowLimiter(key, _ => new FixedWindowRateLimiterOptions
{
PermitLimit = 100,
Window = TimeSpan.FromMinutes(1),
AutoReplenishment = true,
QueueLimit = 0
});
});
By API Key:
options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(context =>
{
var key = context.Request.Headers["X-API-Key"].FirstOrDefault()
?? context.GetClientIp()
?? "unknown";
return RateLimitPartition.GetFixedWindowLimiter(key, _ => new FixedWindowRateLimiterOptions
{
PermitLimit = 100,
Window = TimeSpan.FromMinutes(1),
AutoReplenishment = true,
QueueLimit = 0
});
});
By Tenant ID:
options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(context =>
{
var tenantId = context.User.FindFirst("tenant_id")?.Value
?? context.Request.Headers["X-Tenant-Id"].FirstOrDefault()
?? "unknown";
return RateLimitPartition.GetFixedWindowLimiter(tenantId, _ => new FixedWindowRateLimiterOptions
{
PermitLimit = 100,
Window = TimeSpan.FromMinutes(1),
AutoReplenishment = true,
QueueLimit = 0
});
});
Response Headers¶
Rate Limit Headers¶
When rate limiting is enabled, the middleware adds standard rate limit headers:
Standard Headers:
- X-RateLimit-Limit: Maximum number of requests allowed per window
- X-RateLimit-Remaining: Number of requests remaining in current window
- X-RateLimit-Reset: Unix timestamp when the rate limit resets
- Retry-After: Seconds to wait before retrying (in 429 responses)
Example Response:
429 Too Many Requests Response:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1699920000
Retry-After: 60
Content-Type: application/json
Endpoint Exemptions¶
Disabling Rate Limiting for Specific Endpoints¶
Certain endpoints should bypass rate limiting:
Health Checks:
// HealthChecksExtensions.cs
endpointsBuilder = endpoints.MapHealthChecks("/health");
#if RateLimiting
endpointsBuilder.DisableRateLimiting();
#endif
Swagger UI:
// SwaggerExtensions.cs
var endpointsBuilder = endpoints.MapSwagger();
#if RateLimiting
endpointsBuilder.DisableRateLimiting();
#endif
Scalar UI:
SignalR Hubs:
Why Exempt These Endpoints?: - Health Checks: Must be accessible for monitoring and orchestration - Swagger/Scalar: Documentation endpoints, not production traffic - SignalR: Real-time connections may require different rate limiting strategy
Testing¶
Acceptance Tests¶
The template includes acceptance tests for rate limiting:
// RateLimitingAcceptanceTests.cs
[TestClass]
[DoNotParallelize]
public class RateLimitingAcceptanceTests
{
[TestMethod]
public async Task GlobalRateLimiterShouldReturn429OnSixthRequestWithinWindow()
{
using HttpClient? client = BeforeAfterTestRunHooks.ServerInstance?.CreateClient();
Assert.IsNotNull(client, "TestServer client was not initialized.");
// Set unique test ID for isolation
client.DefaultRequestHeaders.Remove("X-Test-Id");
client.DefaultRequestHeaders.Add("X-Test-Id", Guid.NewGuid().ToString("N"));
const string endpoint = "api/FeatureA/FeatureAUseCaseA";
using var body = new StringContent("{}", Encoding.UTF8, "application/json");
// Send 5 requests (within limit)
for (int i = 0; i < 5; i++)
{
using var ok = await client.PostAsync(endpoint, body);
Assert.IsTrue(ok.IsSuccessStatusCode,
$"Expected success within limit on attempt #{i + 1}, got {(int)ok.StatusCode}");
}
// 6th request should be rate limited
using var limited = await client.PostAsync(endpoint, body);
Assert.AreEqual(HttpStatusCode.TooManyRequests, limited.StatusCode,
"Expected HTTP 429 on the 6th request within the window.");
}
}
Test Configuration:
{
"RateLimiting": {
"EnableRateLimiting": true,
"GlobalLimiter": {
"Window": "00:01:00",
"AutoReplenishment": true,
"PermitLimit": 5,
"QueueLimit": 0
}
}
}
Manual Testing¶
Test Rate Limiting:
# Send multiple requests rapidly
for i in {1..10}; do
curl -X POST http://localhost:5000/api/FeatureA/FeatureAUseCaseA \
-H "Content-Type: application/json" \
-d '{}'
echo ""
done
Expected Behavior:
- First 100 requests (or configured limit): 200 OK
- Subsequent requests: 429 Too Many Requests
- Wait for window to reset: 200 OK again
Best Practices¶
Do's¶
-
Enable Rate Limiting in Production
-
Set Appropriate Limits
-
Exempt Critical Endpoints
-
Use IP-Based Partitioning for Public APIs
-
Set QueueLimit to 0 for Immediate Rejection
-
Monitor Rate Limit Metrics
- Track 429 responses
- Monitor rate limit usage
- Alert on high rejection rates
Don'ts¶
-
Don't Disable Rate Limiting in Production
-
Don't Use Too Restrictive Limits
-
Don't Forget to Exempt Health Checks
-
Don't Use QueueLimit for Rate Limiting
-
Don't Ignore Load Testing
Advanced Scenarios¶
Custom Rate Limiting Policies¶
Endpoint-Specific Policies:
services.AddRateLimiter(options =>
{
// Global limiter
options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(context =>
RateLimitPartition.GetFixedWindowLimiter(
context.GetClientIp() ?? "unknown",
_ => new FixedWindowRateLimiterOptions
{
PermitLimit = 100,
Window = TimeSpan.FromMinutes(1),
AutoReplenishment = true,
QueueLimit = 0
}));
// Endpoint-specific policy
options.AddFixedWindowLimiter("api", limiterOptions =>
{
limiterOptions.PermitLimit = 50;
limiterOptions.Window = TimeSpan.FromMinutes(1);
limiterOptions.AutoReplenishment = true;
limiterOptions.QueueLimit = 0;
});
#if UseMCP
// MCP endpoint rate limiting (automatically configured from appsettings.json)
options.AddFixedWindowLimiter("MCP", limiterOptions =>
{
limiterOptions.PermitLimit = OptionsExtensions.RateLimitingOptions.McpLimiter.PermitLimit;
limiterOptions.Window = OptionsExtensions.RateLimitingOptions.McpLimiter.Window;
limiterOptions.AutoReplenishment = OptionsExtensions.RateLimitingOptions.McpLimiter.AutoReplenishment;
limiterOptions.QueueLimit = OptionsExtensions.RateLimitingOptions.McpLimiter.QueueLimit;
});
#endif
});
Apply to Endpoint:
[EnableRateLimiting("api")]
[HttpPost("orders")]
public async Task<IActionResult> CreateOrder([FromBody] OrderRequest request)
{
// ...
}
// MCP endpoint - rate limiting is automatically applied when configured
endpoints.MapMcp("/mcp")
.RequireRateLimiting("MCP");
MCP Endpoint Rate Limiting¶
When both UseMCP and RateLimiting template parameters are enabled, you can configure MCP-specific rate limiting that works independently from the global rate limiter. This allows you to set different rate limits for MCP endpoints (/mcp) compared to other endpoints.
Configuration in appsettings.json:
{
"RateLimiting": {
"EnableRateLimiting": true,
"GlobalLimiter": {
"Window": "00:01:00",
"PermitLimit": 5,
"AutoReplenishment": true,
"QueueLimit": 0
},
"McpLimiter": {
"Window": "00:01:00",
"PermitLimit": 100,
"AutoReplenishment": true,
"QueueLimit": 0
}
}
}
How It Works:
- The
McpLimiterconfiguration is required when MCP is enabled (can be removed at template generation time if not needed) - When configured, a named rate limiting policy "MCP" is automatically created
- The policy uses the same partitioning strategy as the global limiter (IP address or
X-Test-Idheader) - Rate limiting is automatically applied to the
/mcpendpoint when bothEnableRateLimitingistrueandMcpLimiteris configured - MCP rate limiting is independent from global rate limiting - both can be active simultaneously
Implementation:
The MCP rate limiting policy is configured automatically in RateLimitingExtensions.cs:
#if UseMCP
// Configure MCP-specific rate limiting policy
options.AddFixedWindowLimiter("MCP", limiterOptions =>
{
limiterOptions.PermitLimit = OptionsExtensions.RateLimitingOptions.McpLimiter.PermitLimit;
limiterOptions.Window = OptionsExtensions.RateLimitingOptions.McpLimiter.Window;
limiterOptions.AutoReplenishment = OptionsExtensions.RateLimitingOptions.McpLimiter.AutoReplenishment;
limiterOptions.QueueLimit = OptionsExtensions.RateLimitingOptions.McpLimiter.QueueLimit;
});
#endif
And automatically applied to the MCP endpoint in ModelContextProtocolExtensions.cs:
var routeHandlerBuilder = endpoints.MapMcp("/mcp");
#if RateLimiting
// Apply MCP-specific rate limiting policy if rate limiting is enabled
if (OptionsExtensions.RateLimitingOptions.EnableRateLimiting)
{
routeHandlerBuilder.RequireRateLimiting("MCP");
}
#endif
Best Practices:
- Set MCP rate limits higher than global limits to accommodate AI tool invocation patterns
- Monitor MCP endpoint usage to adjust limits based on actual traffic
- Consider per-user or per-session quotas for production environments
- Test rate limiting under load to ensure it doesn't interfere with legitimate AI tool usage
Sliding Window Rate Limiter¶
Configuration:
options.AddSlidingWindowLimiter("sliding", limiterOptions =>
{
limiterOptions.PermitLimit = 100;
limiterOptions.Window = TimeSpan.FromMinutes(1);
limiterOptions.SegmentsPerWindow = 4; // 4 segments = 15-second segments
limiterOptions.AutoReplenishment = true;
limiterOptions.QueueLimit = 0;
});
Token Bucket Rate Limiter¶
Configuration:
options.AddTokenBucketLimiter("token", limiterOptions =>
{
limiterOptions.TokenLimit = 100;
limiterOptions.ReplenishmentPeriod = TimeSpan.FromMinutes(1);
limiterOptions.TokensPerPeriod = 10;
limiterOptions.AutoReplenishment = true;
limiterOptions.QueueLimit = 0;
});
Concurrency Limiter¶
Configuration:
options.AddConcurrencyLimiter("concurrency", limiterOptions =>
{
limiterOptions.PermitLimit = 10; // Max 10 concurrent requests
limiterOptions.QueueLimit = 0;
});
Troubleshooting¶
Issue: Rate Limiting Not Working¶
Symptoms: Requests not being rate limited, no 429 responses.
Solutions: 1. Verify Rate Limiting is Enabled
-
Check Middleware Order
-
Verify Configuration is Loaded
- Check
RateLimitingOptionsis registered - Verify
appsettings.jsoncontains rate limiting section - Check options validation passes
Issue: Too Many 429 Responses¶
Symptoms: Legitimate users receiving 429 responses.
Solutions: 1. Increase Permit Limit
-
Review Partitioning Strategy
-
Check Window Size
Issue: Health Checks Being Rate Limited¶
Symptoms: Health checks returning 429 responses.
Solutions: 1. Disable Rate Limiting for Health Checks
- Verify Exemption is Applied
- Check
DisableRateLimiting()is called - Verify conditional compilation (
#if RateLimiting)
Issue: Rate Limits Not Resetting¶
Symptoms: Rate limits never reset, permanently blocked.
Solutions: 1. Verify AutoReplenishment is Enabled
- Check Window Configuration
Summary¶
Rate limiting in the ConnectSoft Microservice Template provides:
- ✅ Global Rate Limiting: Fixed window rate limiter with configurable limits
- ✅ Partitioning Strategy: IP-based or custom partition key selection
- ✅ Endpoint Exemptions: Health checks and documentation endpoints bypass rate limiting
- ✅ Configurable Limits: Permit limit, window, and queue limit configuration
- ✅ HTTP 429 Responses: Standard rate limit responses with headers
- ✅ Testing Support: Acceptance tests verify rate limiting behavior
- ✅ Production Ready: Configurable for different environments
By following these patterns, teams can:
- Protect Services: Prevent abuse and overload
- Ensure Fairness: Distribute resources fairly among clients
- Maintain Stability: Keep services responsive under load
- Enforce Quotas: Control API usage and costs
- Monitor Usage: Track rate limit metrics and adjust limits
Rate limiting is an essential security and performance feature that protects microservices from abuse while ensuring fair resource allocation and system stability.