Rate Limiting in Next.js: Protecting Your API Routes
How to implement production-grade rate limiting in Next.js — with Middleware-level protection, per-user limits, and distributed rate limiting using Upstash Redis.
Why Rate Limiting Matters
Without rate limiting, a single malicious actor or a runaway script can:
- Exhaust your OpenAI or third-party API budget in minutes
- DDoS your database with thousands of concurrent queries
- Scrape your entire product catalog
- Attempt credential stuffing on your auth endpoints
Rate limiting is not optional for production APIs.
The Two Levels
Middleware-level: Runs before your Route Handler, at the edge. Best for blanket protection of entire route groups.
Route Handler-level: Runs inside specific handlers. Best for per-endpoint limits with different thresholds (e.g., AI endpoints get stricter limits than health checks).
Upstash Redis: Distributed Rate Limiting
For serverless and edge deployments, you need a distributed rate limiter — a local in-memory counter won't work because each serverless invocation is isolated.
Upstash Redis provides a serverless Redis with a generous free tier and an official rate-limiting library that works at the edge.
npm install @upstash/ratelimit @upstash/redis
UPSTASH_REDIS_REST_URL=https://...
UPSTASH_REDIS_REST_TOKEN=...
// lib/rate-limit.ts
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'
const redis = Redis.fromEnv()
export const rateLimiters = {
// Global: 100 requests per minute per IP
global: new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(100, '60 s'),
prefix: 'rl:global',
}),
// AI endpoints: 10 requests per minute per user
ai: new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(10, '60 s'),
prefix: 'rl:ai',
}),
// Auth endpoints: 5 attempts per 15 minutes per IP
auth: new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(5, '15 m'),
prefix: 'rl:auth',
}),
}
Middleware-Level Protection
Apply rate limiting in Middleware for blanket API protection:
// middleware.ts
import { NextRequest, NextResponse } from 'next/server'
import { rateLimiters } from '@/lib/rate-limit'
export async function middleware(request: NextRequest) {
// Only rate-limit API routes
if (!request.nextUrl.pathname.startsWith('/api/')) {
return NextResponse.next()
}
const ip = request.headers.get('x-forwarded-for')?.split(',')[0].trim()
?? request.headers.get('x-real-ip')
?? 'anonymous'
const { success, limit, remaining, reset } = await rateLimiters.global.limit(ip)
if (!success) {
return new NextResponse(
JSON.stringify({ error: 'Too many requests. Please try again later.' }),
{
status: 429,
headers: {
'Content-Type': 'application/json',
'X-RateLimit-Limit': limit.toString(),
'X-RateLimit-Remaining': '0',
'X-RateLimit-Reset': reset.toString(),
'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(),
},
}
)
}
const response = NextResponse.next()
response.headers.set('X-RateLimit-Remaining', remaining.toString())
return response
}
export const config = {
matcher: ['/api/:path*'],
}
Route Handler-Level Limits
For endpoints that need stricter or per-user limits:
// app/api/chat/route.ts
import { rateLimiters } from '@/lib/rate-limit'
import { getCurrentUser } from '@/lib/auth'
export async function POST(request: Request) {
const user = await getCurrentUser(request)
// Use user ID for authenticated users, IP for anonymous
const identifier = user?.id ?? getClientIp(request)
const { success, reset } = await rateLimiters.ai.limit(identifier)
if (!success) {
return Response.json(
{ error: 'AI rate limit exceeded. Try again in a moment.' },
{
status: 429,
headers: { 'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString() },
}
)
}
// ... AI handler
}
Algorithms
Sliding Window (recommended for most cases): Counts requests in a rolling time window. Prevents traffic spikes at window boundaries that fixed windows allow.
Fixed Window: Simpler but allows a burst of 2x limit at window boundaries (end of one window + start of next).
Token Bucket: Allows short bursts up to a maximum capacity, then refills at a steady rate. Good for upload endpoints where occasional large bursts are acceptable.
// Token bucket — allows bursts
new Ratelimit({
limiter: Ratelimit.tokenBucket(10, '10 s', 20), // 10/10s refill, max 20
})
Getting the Real Client IP
On Vercel and most CDNs, the real IP is in X-Forwarded-For. Be careful — this header can be spoofed in certain configurations:
function getClientIp(request: Request): string {
const forwarded = request.headers.get('x-forwarded-for')
if (forwarded) {
// Take the first IP — the client's IP before any proxies
return forwarded.split(',')[0].trim()
}
return request.headers.get('x-real-ip') ?? 'unknown'
}
On Vercel, x-forwarded-for is set by Vercel's infrastructure and can be trusted. On other platforms, verify the proxy chain.
Testing Rate Limits Locally
# Hit the endpoint 15 times quickly
for i in {1..15}; do
curl -s -o /dev/null -w "%{http_code}\n" http://localhost:3000/api/chat
done
You should see 200 for the first 10 responses and 429 for the last 5.
Rate Limit Headers
Always return standard rate limit headers so clients can handle limits gracefully:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1714857600
Retry-After: 30
Well-behaved API clients and SDKs parse Retry-After to implement exponential backoff automatically.
Rate limiting is one of those things that takes half a day to implement and saves you from a crisis at 2am six months later.
Continue reading
Related articles
Next.js Parallel Routes and Intercepting Routes: A Complete Guide
Parallel routes and intercepting routes are among the most powerful App Router primitives. This guide explains what they do, when to use them, and how to avoid the common pitfalls.
EngineeringVercel vs Netlify vs AWS Amplify for Next.js in 2026
A practical comparison of the three most common Next.js hosting platforms — Vercel, Netlify, and AWS Amplify — with real cost and capability trade-offs.
EngineeringHow to Structure a Next.js Project for Scale
A battle-tested folder structure for Next.js App Router projects — from small MVPs to large SaaS applications with multiple teams.
Stay informed
Get our monthly deep dives.
Engineering, design, and growth insights — once a month. No spam.
Browse all resources