Why Polite Retry?
Retry Storm Prevention
Adaptive Retry Budgeting limits retry volume when downstream systems are already struggling.
Circuit Breaker
Built-in circuit breaker pattern stops requests to failing services, allowing them time to recover.
Smart Jitter
Multiple jitter strategies prevent synchronized retries that cause periodic load spikes.
Backpressure Aware
Respects backpressure signals from downstream services to avoid overwhelming them.
RAF Metrics
Track retry rates, success rates, and retry amplification factor for observability.
AI Infra Ready
Use retry budgets for LLM APIs, embedding services, vector databases, and agent tool calls.
Retries are distributed congestion control
Most retry libraries optimize for one question: how can this request succeed?
Polite Retry optimizes for a different question: how can the overall system remain stable?
When a service experiences partial failure, naive retry policies can make things much worse:
- Service starts failing 50% of requests
- All clients retry failed requests
- Service now receives 2x the load
- More requests fail, triggering more retries
- Cascade collapse
In a 3-tier system with 50% failure rate and 3 retries per tier, request volume can amplify by 6.6x.
Normal: 100 req โ Service โ 100 responses โ
With naive retries during 50% failure:
100 req โโโ
100 retry โผโโโบ Service โโโบ Overload! ๐ฅ
100 retry โ
The Solution: Adaptive Retry Budgeting
Polite Retry implements Adaptive Retry Budgeting (ARB), based on research into retry amplification, cascading failures, and system-aware retry control.
Successful requests establish retry capacity. Failures consume it. When failure rates rise or downstream systems signal overload, retry capacity shrinks automatically.
import { retryWithBudget, AdaptiveRetryBudget } from 'polite-retry';
// Create a shared budget (one per downstream service)
const budget = new AdaptiveRetryBudget({
initialBudget: 0.2, // Allow 20% retry overhead
highFailureThreshold: 0.3, // Reduce budget when >30% failing
});
// All requests share this budget
const data = await retryWithBudget(
async () => {
const res = await fetch('https://api.example.com/data');
if (!res.ok) throw new Error(`HTTP ${res.status}`);
return res.json();
},
budget,
{ maxRetries: 3, jitter: 'full' }
);
// Check metrics
console.log(budget.getMetrics());
// { retryAmplificationFactor: 1.15, failureRate: 0.08, ... }
Built from Research
| Finding | What it means |
|---|---|
| Naive retries can reduce success rates | Retrying more is not always more reliable. |
| Only 4.9% of detected retry configurations used jitter | Many systems remain exposed to synchronized retry waves. |
| Multi-tier retries amplify request volume | Local retry choices become global load problems. |
| Adaptive Retry Budgeting limits retry storms | Retries become a bounded resource instead of an unlimited reaction. |
Choose Your Strategy
| Strategy | Use Case | Amplification Risk | Complexity |
|---|---|---|---|
retry() |
Simple retries with backoff | Medium | Low |
retryWithCircuitBreaker() |
Stop when service is down | Low | Medium |
retryWithBudget() |
Production microservices | Very Low | Medium |
retryWithProtection() |
Critical systems | Very Low | Higher |
Quick Start
import { retry } from 'polite-retry';
// Basic retry with exponential backoff and jitter
const data = await retry(
async () => {
const response = await fetch('https://api.example.com/data');
if (!response.ok) throw new Error(`HTTP ${response.status}`);
return response.json();
},
{
maxRetries: 3,
initialDelayMs: 100,
jitter: 'full', // Prevents synchronized retry storms
onRetry: (error, attempt) => {
console.log(`Attempt ${attempt} failed: ${error.message}`);
}
}
);