Rate Limiting in ConnectSoft Base Template¶
Purpose & Overview¶
Rate Limiting is a mechanism that controls the number of requests a client can make to an API within a specified time window. In the ConnectSoft Base Template, rate limiting is implemented using ASP.NET Core's built-in rate limiting middleware (via ConnectSoft.Extensions.RateLimiting), protecting the microservice from abuse, ensuring fair resource allocation, and maintaining system stability under load.
Target Framework
The Base Template targets .NET 10. Rate limiting uses ASP.NET Core's built-in middleware. See Rate Limiting in ASP.NET Core on Microsoft Learn.
For Base Template-specific configuration, path exclusions, pipeline order, and implementation details, see the Base Template
docs/Rate Limiting.mdin theConnectSoft.BaseTemplaterepository.
Rate limiting provides:
- Protection Against Abuse: Prevents malicious or misconfigured clients from overwhelming the service
- Fair Resource Allocation: Ensures resources are distributed fairly among clients
- System Stability: Protects backend services from traffic spikes
- Cost Control: Limits API usage to prevent excessive resource consumption
- DDoS Mitigation: First line of defense against denial-of-service attacks
- Compliance: Enables enforcement of usage quotas and service-level agreements
Rate Limiting Philosophy
Rate limiting is a critical security and performance feature that should be configured thoughtfully. The Base Template provides a global rate limiter with configurable limits, while allowing specific paths and endpoints (like health checks, Swagger, MCP) to bypass or use separate rate limiting when necessary. Rate limits should be tested under load and adjusted based on actual traffic patterns and system capacity.
Architecture Overview¶
Rate Limiting in the Request Pipeline¶
Incoming Request
↓
Rate Limiting Middleware (UseRateLimiter)
├── Extract Partition Key (IP, User ID, etc.)
├── Check Rate Limit Policy
├── Acquire Permit
│ ├── Success → Continue to endpoint
│ └── Failure → Return 429 Too Many Requests
↓
Routing Middleware
↓
Controller/Endpoint
↓
Response
Rate Limiting Components¶
RateLimitingExtensions.cs
├── AddMicroserviceRateLimiting(configuration) - Service Registration
│ ├── Options from OptionsExtensions.RateLimitingOptions (section: RateLimiting)
│ ├── Delegates to AddConnectSoftRateLimiting(rateLimitingOptions[, excludeFromRateLimiting])
│ ├── Path exclusions: /assets/, /swagger/, /hangfire, /dashboard, /scalar, /mcp, DevUI, health checks
│ ├── MCP policy from OptionsExtensions.McpRateLimitingOptions (section: McpRateLimiting)
│ └── RejectionStatusCode (429)
└── UseMicroserviceRateLimiter(configuration) - Middleware
├── Delegates to UseConnectSoftRateLimiter(rateLimitingOptions)
└── Place after UseMicroserviceRequestTimeouts(), before UseEndpoints()
OptionsExtensions.cs
├── AddConnectSoftRateLimitingOptions(configuration) → RateLimiting section
└── AddConnectSoftMcpRateLimitingOptions(configuration) → McpRateLimiting section (when UseMCP)
Service Registration¶
OptionsExtensions Pattern¶
Options are registered in AddMicroserviceOptions() and consumed via static properties:
OptionsExtensions.RateLimitingOptions— from sectionRateLimitingviaAddConnectSoftRateLimitingOptions(configuration)OptionsExtensions.McpRateLimitingOptions— from sectionMcpRateLimitingviaAddConnectSoftMcpRateLimitingOptions(configuration)(when UseMCP and RateLimiting are enabled)
AddMicroserviceRateLimiting Extension¶
Rate limiting is registered via AddMicroserviceRateLimiting(configuration):
Implementation — delegates to ConnectSoft.Extensions.RateLimiting:
// RateLimitingExtensions.cs
var rateLimitingOptions = OptionsExtensions.RateLimitingOptions;
// With path exclusions (e.g., /assets/, /swagger/, /mcp, health checks, etc.)
services.AddConnectSoftRateLimiting(rateLimitingOptions, excludeFromRateLimiting: context =>
{
var path = context.Request.Path.Value ?? string.Empty;
return excludedPaths.Any(p => path.StartsWith(p, StringComparison.OrdinalIgnoreCase));
});
// Or without exclusions
services.AddConnectSoftRateLimiting(rateLimitingOptions);
// MCP policy (when UseMCP) from McpRateLimiting section
var mcpRateLimitingOptions = OptionsExtensions.McpRateLimitingOptions;
// ... Configure "MCP" policy via options.AddFixedWindowLimiter("MCP", ...)
UseMicroserviceRateLimiter Middleware¶
Pipeline Position:
Placement (see Request Timeout for full middleware order):
// Middleware order:
application.UseRouting(); // Before rate limiting
application.UseMicroserviceRequestTimeouts(); // Request Timeout runs before Rate Limiter
application.UseMicroserviceRateLimiter(); // After Request Timeout
application.UseEndpoints(...); // After rate limiting
Important: Rate limiting middleware must be placed:
- After UseRouting() (to access route information)
- After UseMicroserviceRequestTimeouts() (plan-recommended order)
- Before UseEndpoints() (to intercept requests before endpoint execution)
Path Exclusions¶
The following paths are excluded from the global rate limiter (via predicate):
| Path | When |
|---|---|
/assets/ |
Always (static assets) |
/swagger/ |
When Swagger is enabled |
/hangfire |
When Hangfire is enabled |
/dashboard |
When Orleans is enabled |
/scalar |
When Scalar is enabled |
/mcp |
When MCP is enabled (uses separate MCP policy) |
| DevUI path | When Microsoft Agent Framework DevUI is enabled |
/v1/ |
When Microsoft Agent Framework DevUI API is enabled |
| Health checks path | When HealthChecks are enabled |
Rate Limiting Algorithms¶
Fixed Window Rate Limiter¶
The template uses Fixed Window rate limiting by default:
How It Works: - Divides time into fixed windows (e.g., 1 minute) - Allows a fixed number of requests per window (e.g., 100 requests) - Resets the counter at the start of each new window - Simple and predictable behavior
Example:
Window: 1 minute
Permit Limit: 5 requests
Time: 00:00:00 - 00:01:00 → 5 requests allowed
Time: 00:01:00 - 00:02:00 → Counter resets, 5 requests allowed again
Advantages: - Simple to understand and implement - Predictable reset behavior - Low memory overhead - Easy to configure
Disadvantages: - Can allow bursts at window boundaries - May not provide smooth rate limiting - Less precise than sliding window
Other Rate Limiting Algorithms (Available in ASP.NET Core)¶
Sliding Window: - Rolling window of time - Smoother rate limiting - More memory intensive
Token Bucket: - Allows bursts up to bucket size - Refills tokens at fixed rate - Good for bursty traffic patterns
Concurrency Limiter: - Limits concurrent requests - Not time-based - Useful for resource protection
Configuration¶
RateLimitingOptions (section: RateLimiting)¶
Options are registered via AddConnectSoftRateLimitingOptions(configuration) in OptionsExtensions.AddMicroserviceOptions(). The RateLimiting section contains:
EnableRateLimiting— Master switch for rate limitingGlobalLimiter— Fixed-window policy (Window, AutoReplenishment, PermitLimit, QueueLimit)
GlobalLimiterOptions¶
Configuration Class:
// GlobalLimiterOptions.cs
public sealed class GlobalLimiterOptions
{
/// <summary>
/// Time window that takes in the requests.
/// Must be greater than TimeSpan.Zero.
/// </summary>
[Required]
[DataType(DataType.Duration)]
required public TimeSpan Window { get; set; } = TimeSpan.FromSeconds(1);
/// <summary>
/// Whether the fixed window rate limiter automatically refreshes counters
/// or if someone else will be calling externally to refresh counters.
/// </summary>
[Required]
required public bool AutoReplenishment { get; set; } = true;
/// <summary>
/// Maximum number of permit counters that can be allowed in a window.
/// Must be greater than 0.
/// </summary>
[Required]
required public int PermitLimit { get; set; }
/// <summary>
/// Maximum cumulative permit count of queued acquisition requests.
/// Must be greater than or equal to 0.
/// </summary>
[Required]
required public int QueueLimit { get; set; }
}
appsettings.json Configuration¶
RateLimiting section (global rate limiting):
{
"RateLimiting": {
"EnableRateLimiting": true,
"GlobalLimiter": {
"Window": "00:01:00",
"AutoReplenishment": true,
"PermitLimit": 100,
"QueueLimit": 0
}
}
}
McpRateLimiting section (MCP endpoint rate limiting; separate from RateLimiting; used when UseMCP and RateLimiting are enabled):
{
"McpRateLimiting": {
"McpLimiter": {
"Window": "00:01:00",
"AutoReplenishment": true,
"PermitLimit": 100,
"QueueLimit": 0
}
}
}
Note: MCP rate limiting uses the McpRateLimiting section (not RateLimiting.McpLimiter). When configured, a named policy "MCP" is created and applied to the /mcp endpoint.
Configuration Parameters:
| Parameter | Type | Description | Default | Example |
|---|---|---|---|---|
EnableRateLimiting |
bool |
Enable or disable rate limiting | false |
true |
GlobalLimiter |
GlobalLimiterOptions |
Global rate limiter settings (required) | Required | See below |
GlobalLimiter and McpLimiter Options:
| Parameter | Type | Description | Default | Example |
|---|---|---|---|---|
Window |
TimeSpan |
Time window for rate limiting | 00:00:01 |
00:01:00 (1 minute) |
AutoReplenishment |
bool |
Automatically refresh counters | true |
true |
PermitLimit |
int |
Maximum requests per window | Required | 100 |
QueueLimit |
int |
Maximum queued requests | 0 |
0 (no queuing) |
Environment-Specific Configuration¶
Development:
{
"RateLimiting": {
"EnableRateLimiting": true,
"GlobalLimiter": {
"Window": "00:01:00",
"AutoReplenishment": true,
"PermitLimit": 100,
"QueueLimit": 0
}
}
}
Production:
{
"RateLimiting": {
"EnableRateLimiting": true,
"GlobalLimiter": {
"Window": "00:01:00",
"AutoReplenishment": true,
"PermitLimit": 1000,
"QueueLimit": 0
}
}
}
Testing:
{
"RateLimiting": {
"EnableRateLimiting": true,
"GlobalLimiter": {
"Window": "00:01:00",
"AutoReplenishment": true,
"PermitLimit": 5,
"QueueLimit": 0
}
}
}
Partitioning Strategy¶
Partition Key Selection¶
The rate limiter uses a partitioning strategy to group requests:
Current Implementation:
var key = context.Request.Headers.TryGetValue("X-Test-Id", out var v)
? v.ToString()
: context.GetClientIp() ?? "unknown";
Partition Key Priority:
1. X-Test-Id Header: Used for testing (if present)
2. Client IP Address: Extracted from X-Forwarded-For or RemoteIpAddress
3. "unknown": Fallback if IP cannot be determined
Client IP Extraction:
private static string GetClientIp(this HttpContext httpContext)
{
string forwardedFor = httpContext.Request.Headers["X-Forwarded-For"].FirstOrDefault();
if (!string.IsNullOrEmpty(forwardedFor))
{
return forwardedFor.Split(',')[0]; // Take the first IP in the list
}
return httpContext.Connection.RemoteIpAddress?.ToString();
}
Alternative Partitioning Strategies¶
By User Identity:
options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(context =>
{
var key = context.User.Identity?.Name ?? context.GetClientIp() ?? "unknown";
return RateLimitPartition.GetFixedWindowLimiter(key, _ => new FixedWindowRateLimiterOptions
{
PermitLimit = 100,
Window = TimeSpan.FromMinutes(1),
AutoReplenishment = true,
QueueLimit = 0
});
});
By API Key:
options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(context =>
{
var key = context.Request.Headers["X-API-Key"].FirstOrDefault()
?? context.GetClientIp()
?? "unknown";
return RateLimitPartition.GetFixedWindowLimiter(key, _ => new FixedWindowRateLimiterOptions
{
PermitLimit = 100,
Window = TimeSpan.FromMinutes(1),
AutoReplenishment = true,
QueueLimit = 0
});
});
By Tenant ID:
options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(context =>
{
var tenantId = context.User.FindFirst("tenant_id")?.Value
?? context.Request.Headers["X-Tenant-Id"].FirstOrDefault()
?? "unknown";
return RateLimitPartition.GetFixedWindowLimiter(tenantId, _ => new FixedWindowRateLimiterOptions
{
PermitLimit = 100,
Window = TimeSpan.FromMinutes(1),
AutoReplenishment = true,
QueueLimit = 0
});
});
Response Headers¶
Rate Limit Headers¶
When rate limiting is enabled, the middleware adds standard rate limit headers:
Standard Headers:
- X-RateLimit-Limit: Maximum number of requests allowed per window
- X-RateLimit-Remaining: Number of requests remaining in current window
- X-RateLimit-Reset: Unix timestamp when the rate limit resets
- Retry-After: Seconds to wait before retrying (in 429 responses)
Example Response:
429 Too Many Requests Response:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1699920000
Retry-After: 60
Content-Type: application/json
Endpoint Exemptions¶
Disabling Rate Limiting for Specific Endpoints¶
Certain endpoints should bypass rate limiting:
Health Checks:
// HealthChecksExtensions.cs
endpointsBuilder = endpoints.MapHealthChecks("/health");
#if RateLimiting
endpointsBuilder.DisableRateLimiting();
#endif
Swagger UI:
// SwaggerExtensions.cs
var endpointsBuilder = endpoints.MapSwagger();
#if RateLimiting
endpointsBuilder.DisableRateLimiting();
#endif
Scalar UI:
SignalR Hubs:
Why Exempt These Endpoints?: - Health Checks: Must be accessible for monitoring and orchestration - Swagger/Scalar: Documentation endpoints, not production traffic - SignalR: Real-time connections may require different rate limiting strategy
Testing¶
Acceptance Tests¶
The template includes acceptance tests for rate limiting:
// RateLimitingAcceptanceTests.cs
[TestClass]
[DoNotParallelize]
public class RateLimitingAcceptanceTests
{
[TestMethod]
public async Task GlobalRateLimiterShouldReturn429OnSixthRequestWithinWindow()
{
using HttpClient? client = BeforeAfterTestRunHooks.ServerInstance?.CreateClient();
Assert.IsNotNull(client, "TestServer client was not initialized.");
// Set unique test ID for isolation
client.DefaultRequestHeaders.Remove("X-Test-Id");
client.DefaultRequestHeaders.Add("X-Test-Id", Guid.NewGuid().ToString("N"));
const string endpoint = "api/FeatureA/FeatureAUseCaseA";
using var body = new StringContent("{}", Encoding.UTF8, "application/json");
// Send 5 requests (within limit)
for (int i = 0; i < 5; i++)
{
using var ok = await client.PostAsync(endpoint, body);
Assert.IsTrue(ok.IsSuccessStatusCode,
$"Expected success within limit on attempt #{i + 1}, got {(int)ok.StatusCode}");
}
// 6th request should be rate limited
using var limited = await client.PostAsync(endpoint, body);
Assert.AreEqual(HttpStatusCode.TooManyRequests, limited.StatusCode,
"Expected HTTP 429 on the 6th request within the window.");
}
}
Test Configuration:
{
"RateLimiting": {
"EnableRateLimiting": true,
"GlobalLimiter": {
"Window": "00:01:00",
"AutoReplenishment": true,
"PermitLimit": 5,
"QueueLimit": 0
}
}
}
Manual Testing¶
Test Rate Limiting:
# Send multiple requests rapidly
for i in {1..10}; do
curl -X POST http://localhost:5000/api/FeatureA/FeatureAUseCaseA \
-H "Content-Type: application/json" \
-d '{}'
echo ""
done
Expected Behavior:
- First 100 requests (or configured limit): 200 OK
- Subsequent requests: 429 Too Many Requests
- Wait for window to reset: 200 OK again
Best Practices¶
Do's¶
-
Enable Rate Limiting in Production
-
Set Appropriate Limits
-
Exempt Critical Endpoints
-
Use IP-Based Partitioning for Public APIs
-
Set QueueLimit to 0 for Immediate Rejection
-
Monitor Rate Limit Metrics
- Track 429 responses
- Monitor rate limit usage
- Alert on high rejection rates
Don'ts¶
-
Don't Disable Rate Limiting in Production
-
Don't Use Too Restrictive Limits
-
Don't Forget to Exempt Health Checks
-
Don't Use QueueLimit for Rate Limiting
-
Don't Ignore Load Testing
Advanced Scenarios¶
Custom Rate Limiting Policies¶
Endpoint-Specific Policies:
services.AddRateLimiter(options =>
{
// Global limiter
options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(context =>
RateLimitPartition.GetFixedWindowLimiter(
context.GetClientIp() ?? "unknown",
_ => new FixedWindowRateLimiterOptions
{
PermitLimit = 100,
Window = TimeSpan.FromMinutes(1),
AutoReplenishment = true,
QueueLimit = 0
}));
// Endpoint-specific policy
options.AddFixedWindowLimiter("api", limiterOptions =>
{
limiterOptions.PermitLimit = 50;
limiterOptions.Window = TimeSpan.FromMinutes(1);
limiterOptions.AutoReplenishment = true;
limiterOptions.QueueLimit = 0;
});
#if UseMCP
// MCP endpoint rate limiting from McpRateLimiting section
var mcpOptions = OptionsExtensions.McpRateLimitingOptions;
if (mcpOptions?.McpLimiter is not null)
{
options.AddFixedWindowLimiter("MCP", limiterOptions =>
{
limiterOptions.PermitLimit = mcpOptions.McpLimiter.PermitLimit;
limiterOptions.Window = mcpOptions.McpLimiter.Window;
limiterOptions.AutoReplenishment = mcpOptions.McpLimiter.AutoReplenishment;
limiterOptions.QueueLimit = mcpOptions.McpLimiter.QueueLimit;
});
}
#endif
});
Apply to Endpoint:
[EnableRateLimiting("api")]
[HttpPost("orders")]
public async Task<IActionResult> CreateOrder([FromBody] OrderRequest request)
{
// ...
}
// MCP endpoint — use ConnectSoft mapping so policy + optional auth are applied consistently.
// In the template this is done inside MapMicroserviceMCPServer(); shown here for reference:
using ConnectSoft.Extensions.ModelContextProtocol;
endpoints.MapConnectSoftModelContextProtocol("/mcp", rateLimitingPolicy: "MCP");
MCP Endpoint Rate Limiting¶
When both UseMCP and RateLimiting template parameters are enabled, you can configure MCP-specific rate limiting that works independently from the global rate limiter. This allows you to set different rate limits for MCP endpoints (/mcp) compared to other endpoints.
Configuration in appsettings.json — use the McpRateLimiting section (separate from RateLimiting):
{
"RateLimiting": {
"EnableRateLimiting": true,
"GlobalLimiter": {
"Window": "00:01:00",
"PermitLimit": 5,
"AutoReplenishment": true,
"QueueLimit": 0
}
},
"McpRateLimiting": {
"McpLimiter": {
"Window": "00:01:00",
"PermitLimit": 100,
"AutoReplenishment": true,
"QueueLimit": 0
}
}
}
How It Works:
- MCP rate limiting uses the McpRateLimiting section with
McpLimiter(notRateLimiting.McpLimiter) - When configured, a named rate limiting policy "MCP" is automatically created
- The policy uses the same partitioning strategy as the global limiter (IP address or
X-Test-Idheader) - The
/mcppath is excluded from global rate limiting and uses the MCP policy instead - MCP rate limiting is independent from global rate limiting - both can be active simultaneously
Implementation:
The MCP rate limiting policy is configured from OptionsExtensions.McpRateLimitingOptions in RateLimitingExtensions.cs:
#if UseMCP
var mcpRateLimitingOptions = OptionsExtensions.McpRateLimitingOptions;
if (mcpRateLimitingOptions?.McpLimiter is not null && rateLimitingOptions.EnableRateLimiting)
{
services.Configure<RateLimiterOptions>(options =>
{
options.AddFixedWindowLimiter("MCP", limiterOptions =>
{
limiterOptions.PermitLimit = mcpRateLimitingOptions.McpLimiter.PermitLimit;
limiterOptions.Window = mcpRateLimitingOptions.McpLimiter.Window;
limiterOptions.AutoReplenishment = mcpRateLimitingOptions.McpLimiter.AutoReplenishment;
limiterOptions.QueueLimit = mcpRateLimitingOptions.McpLimiter.QueueLimit;
});
});
}
#endif
The template applies the MCP policy when mapping the endpoint in ModelContextProtocolExtensions.cs by passing the policy name into MapConnectSoftModelContextProtocol (only when McpServerTransportType is Http):
using ConnectSoft.Extensions.ModelContextProtocol;
// ...
endpoints.MapConnectSoftModelContextProtocol(
"/mcp",
rateLimitingPolicy: "MCP",
requireAuthorization: requireAuthorization);
Best Practices:
- Set MCP rate limits higher than global limits to accommodate AI tool invocation patterns
- Monitor MCP endpoint usage to adjust limits based on actual traffic
- Consider per-user or per-session quotas for production environments
- Test rate limiting under load to ensure it doesn't interfere with legitimate AI tool usage
Sliding Window Rate Limiter¶
Configuration:
options.AddSlidingWindowLimiter("sliding", limiterOptions =>
{
limiterOptions.PermitLimit = 100;
limiterOptions.Window = TimeSpan.FromMinutes(1);
limiterOptions.SegmentsPerWindow = 4; // 4 segments = 15-second segments
limiterOptions.AutoReplenishment = true;
limiterOptions.QueueLimit = 0;
});
Token Bucket Rate Limiter¶
Configuration:
options.AddTokenBucketLimiter("token", limiterOptions =>
{
limiterOptions.TokenLimit = 100;
limiterOptions.ReplenishmentPeriod = TimeSpan.FromMinutes(1);
limiterOptions.TokensPerPeriod = 10;
limiterOptions.AutoReplenishment = true;
limiterOptions.QueueLimit = 0;
});
Concurrency Limiter¶
Configuration:
options.AddConcurrencyLimiter("concurrency", limiterOptions =>
{
limiterOptions.PermitLimit = 10; // Max 10 concurrent requests
limiterOptions.QueueLimit = 0;
});
Troubleshooting¶
Issue: Rate Limiting Not Working¶
Symptoms: Requests not being rate limited, no 429 responses.
Solutions: 1. Verify Rate Limiting is Enabled
-
Check Middleware Order
-
Verify Configuration is Loaded
- Check
RateLimitingOptionsis registered - Verify
appsettings.jsoncontains rate limiting section - Check options validation passes
Issue: Too Many 429 Responses¶
Symptoms: Legitimate users receiving 429 responses.
Solutions: 1. Increase Permit Limit
-
Review Partitioning Strategy
-
Check Window Size
Issue: Health Checks Being Rate Limited¶
Symptoms: Health checks returning 429 responses.
Solutions: 1. Disable Rate Limiting for Health Checks
- Verify Exemption is Applied
- Check
DisableRateLimiting()is called - Verify conditional compilation (
#if RateLimiting)
Issue: Rate Limits Not Resetting¶
Symptoms: Rate limits never reset, permanently blocked.
Solutions: 1. Verify AutoReplenishment is Enabled
- Check Window Configuration
Code Standards¶
When implementing or extending rate limiting, follow the Coding Standards. Use consistent naming, XML documentation, and analyzer rules (StyleCop, AspNetCoreAnalyzers).
Summary¶
Rate limiting in the ConnectSoft Base Template provides:
- ✅ Global Rate Limiting: Fixed window rate limiter with configurable limits
- ✅ Partitioning Strategy: IP-based or custom partition key selection
- ✅ Endpoint Exemptions: Health checks and documentation endpoints bypass rate limiting
- ✅ Configurable Limits: Permit limit, window, and queue limit configuration
- ✅ HTTP 429 Responses: Standard rate limit responses with headers
- ✅ Testing Support: Acceptance tests verify rate limiting behavior
- ✅ Production Ready: Configurable for different environments
By following these patterns, teams can:
- Protect Services: Prevent abuse and overload
- Ensure Fairness: Distribute resources fairly among clients
- Maintain Stability: Keep services responsive under load
- Enforce Quotas: Control API usage and costs
- Monitor Usage: Track rate limit metrics and adjust limits
Rate limiting is an essential security and performance feature that protects microservices from abuse while ensuring fair resource allocation and system stability.