All insights
Engineering7 min read

Rate Limiting in Next.js: Protecting Your API Routes

How to implement production-grade rate limiting in Next.js — with Middleware-level protection, per-user limits, and distributed rate limiting using Upstash Redis.

Why Rate Limiting Matters

Without rate limiting, a single malicious actor or a runaway script can:

  • Exhaust your OpenAI or third-party API budget in minutes
  • DDoS your database with thousands of concurrent queries
  • Scrape your entire product catalog
  • Attempt credential stuffing on your auth endpoints

Rate limiting is not optional for production APIs.

The Two Levels

Middleware-level: Runs before your Route Handler, at the edge. Best for blanket protection of entire route groups.

Route Handler-level: Runs inside specific handlers. Best for per-endpoint limits with different thresholds (e.g., AI endpoints get stricter limits than health checks).

Upstash Redis: Distributed Rate Limiting

For serverless and edge deployments, you need a distributed rate limiter — a local in-memory counter won't work because each serverless invocation is isolated.

Upstash Redis provides a serverless Redis with a generous free tier and an official rate-limiting library that works at the edge.

code
npm install @upstash/ratelimit @upstash/redis
code
UPSTASH_REDIS_REST_URL=https://...
UPSTASH_REDIS_REST_TOKEN=...
code
// lib/rate-limit.ts
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'

const redis = Redis.fromEnv()

export const rateLimiters = {
  // Global: 100 requests per minute per IP
  global: new Ratelimit({
    redis,
    limiter: Ratelimit.slidingWindow(100, '60 s'),
    prefix: 'rl:global',
  }),

  // AI endpoints: 10 requests per minute per user
  ai: new Ratelimit({
    redis,
    limiter: Ratelimit.slidingWindow(10, '60 s'),
    prefix: 'rl:ai',
  }),

  // Auth endpoints: 5 attempts per 15 minutes per IP
  auth: new Ratelimit({
    redis,
    limiter: Ratelimit.slidingWindow(5, '15 m'),
    prefix: 'rl:auth',
  }),
}

Middleware-Level Protection

Apply rate limiting in Middleware for blanket API protection:

code
// middleware.ts
import { NextRequest, NextResponse } from 'next/server'
import { rateLimiters } from '@/lib/rate-limit'

export async function middleware(request: NextRequest) {
  // Only rate-limit API routes
  if (!request.nextUrl.pathname.startsWith('/api/')) {
    return NextResponse.next()
  }

  const ip = request.headers.get('x-forwarded-for')?.split(',')[0].trim()
    ?? request.headers.get('x-real-ip')
    ?? 'anonymous'

  const { success, limit, remaining, reset } = await rateLimiters.global.limit(ip)

  if (!success) {
    return new NextResponse(
      JSON.stringify({ error: 'Too many requests. Please try again later.' }),
      {
        status: 429,
        headers: {
          'Content-Type': 'application/json',
          'X-RateLimit-Limit': limit.toString(),
          'X-RateLimit-Remaining': '0',
          'X-RateLimit-Reset': reset.toString(),
          'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(),
        },
      }
    )
  }

  const response = NextResponse.next()
  response.headers.set('X-RateLimit-Remaining', remaining.toString())
  return response
}

export const config = {
  matcher: ['/api/:path*'],
}

Route Handler-Level Limits

For endpoints that need stricter or per-user limits:

code
// app/api/chat/route.ts
import { rateLimiters } from '@/lib/rate-limit'
import { getCurrentUser } from '@/lib/auth'

export async function POST(request: Request) {
  const user = await getCurrentUser(request)

  // Use user ID for authenticated users, IP for anonymous
  const identifier = user?.id ?? getClientIp(request)

  const { success, reset } = await rateLimiters.ai.limit(identifier)

  if (!success) {
    return Response.json(
      { error: 'AI rate limit exceeded. Try again in a moment.' },
      {
        status: 429,
        headers: { 'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString() },
      }
    )
  }

  // ... AI handler
}

Algorithms

Sliding Window (recommended for most cases): Counts requests in a rolling time window. Prevents traffic spikes at window boundaries that fixed windows allow.

Fixed Window: Simpler but allows a burst of 2x limit at window boundaries (end of one window + start of next).

Token Bucket: Allows short bursts up to a maximum capacity, then refills at a steady rate. Good for upload endpoints where occasional large bursts are acceptable.

code
// Token bucket — allows bursts
new Ratelimit({
  limiter: Ratelimit.tokenBucket(10, '10 s', 20), // 10/10s refill, max 20
})

Getting the Real Client IP

On Vercel and most CDNs, the real IP is in X-Forwarded-For. Be careful — this header can be spoofed in certain configurations:

code
function getClientIp(request: Request): string {
  const forwarded = request.headers.get('x-forwarded-for')
  if (forwarded) {
    // Take the first IP — the client's IP before any proxies
    return forwarded.split(',')[0].trim()
  }
  return request.headers.get('x-real-ip') ?? 'unknown'
}

On Vercel, x-forwarded-for is set by Vercel's infrastructure and can be trusted. On other platforms, verify the proxy chain.

Testing Rate Limits Locally

code
# Hit the endpoint 15 times quickly
for i in {1..15}; do
  curl -s -o /dev/null -w "%{http_code}\n" http://localhost:3000/api/chat
done

You should see 200 for the first 10 responses and 429 for the last 5.

Rate Limit Headers

Always return standard rate limit headers so clients can handle limits gracefully:

code
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1714857600
Retry-After: 30

Well-behaved API clients and SDKs parse Retry-After to implement exponential backoff automatically.

Rate limiting is one of those things that takes half a day to implement and saves you from a crisis at 2am six months later.

Stay informed

Get our monthly deep dives.

Engineering, design, and growth insights — once a month. No spam.

Browse all resources