← All articles
Engineering · 6 min read

API Gateway Patterns That Actually Hold Up Under Production Load

By Romanov Solutions · June 3, 2026

Most API gateways look fine in staging. They fall apart at 9:02 AM on a Tuesday when three upstream services hiccup simultaneously and every retry in the fleet fires at once. The patterns below are what we actually configure for clients — not what the vendor quickstart recommends.

Rate Limiting: Token Bucket Over Fixed Window, Every Time

Fixed-window rate limiting is the wrong default. If your limit is 1,000 requests per minute and a client sends 1,000 in the last two seconds of minute one and 1,000 in the first two seconds of minute two, you've just absorbed 2,000 requests in four seconds with zero protection. Token bucket and sliding window log algorithms eliminate that boundary exploit.

In practice, we configure token bucket at the gateway level (Kong, AWS API Gateway, or Envoy depending on the stack) with separate buckets per API key and per IP. Key-level limiting protects your backend from a single bad actor; IP-level limiting catches credential-stuffing scenarios where one key is shared across a botnet. Neither alone is sufficient.

Circuit Breakers Belong at the Gateway, Not Just the Service

The classic circuit breaker pattern lives inside a service — Resilience4j, Polly, whatever your language ecosystem offers. That's necessary but not sufficient. Put a second circuit breaker at the gateway layer so that a degraded upstream stops receiving traffic before your service mesh is saturated with in-flight requests that will never resolve.

The configuration that actually works: a half-open state that allows exactly one probe request every 10 seconds, with a success threshold of three consecutive 2xx responses before fully reopening. One probe is conservative; it prevents the thundering-herd re-open problem where every waiting client rushes in the moment the breaker cracks.

One client we onboarded had a payment processor integration that would occasionally enter a slow-response death spiral — responding in 28 seconds instead of timing out cleanly. Their gateway had no circuit breaker. Every API call stacked up, thread pools exhausted, and the outage cascaded to unrelated endpoints. A 5-second timeout plus a gateway-level circuit breaker with a 50% error threshold over a 30-second window would have contained the blast radius entirely.

Auth Chaining: JWT Validation Before You Touch Business Logic

A gateway should validate the JWT signature, check expiry, and assert required claims before the request ever reaches your application layer. This sounds obvious but is frequently skipped in favor of handling auth inside the app — which means every service reimplements validation logic, and one missed check becomes a vulnerability.

The pattern we standardize on:

This separation keeps your gateway fast — JWKS validation with a warm cache adds under 2ms of latency — and keeps your services focused on business logic rather than cryptographic ceremony.

Observability Is Not Optional

Every gateway decision above is only as good as your ability to see it working. Emit structured logs with request ID, upstream latency, circuit state, rate-limit bucket, and auth claim subset on every request. Correlate those logs with your APM traces. If your gateway is a black box, you will debug the wrong layer during an incident.

We wire gateway logs into a centralized platform (Datadog, Grafana Cloud, or OpenSearch depending on client preference) with two non-negotiable dashboards: a real-time rate-limit saturation view and a circuit-breaker state timeline. Both have saved clients from self-inflicted outages within weeks of going live.

Practical Takeaway

Audit your gateway configuration this week against three questions: Does your rate limiter use a sliding window or token bucket? Does your circuit breaker sit at the gateway layer, not just inside services? Does JWT validation happen before your application code runs? If any answer is no, you have a production incident waiting for the right Tuesday morning to introduce itself. Fix the cheapest one first — it's almost always the rate limiter.

api gateway patternsrate limiting strategiescircuit breaker patternenterprise api designapi securitybackend engineeringmicroservices architecture
Was this useful?
Ask AI