Integrating OpenAI with Next.js: Streaming, Edge, and Production Patterns
A production guide to building AI-powered features in Next.js — streaming text generation, edge deployment, rate limiting, and cost control.
The Core Pattern
Every AI feature in a web application boils down to: send a prompt to an LLM, stream the response back to the user. Next.js App Router's Route Handlers and the Vercel AI SDK make this straightforward to implement correctly.
Setup
npm install ai openai
OPENAI_API_KEY=sk-...
Basic Streaming Response
Use the Vercel AI SDK for streaming — it handles the server-sent event protocol and React client hooks automatically.
// app/api/chat/route.ts
import { openai } from '@ai-sdk/openai'
import { streamText } from 'ai'
export const runtime = 'edge'
export async function POST(request: Request) {
const { messages } = await request.json()
const result = streamText({
model: openai('gpt-4o'),
messages,
maxTokens: 1024,
})
return result.toDataStreamResponse()
}
On the client, use the useChat hook:
'use client'
import { useChat } from 'ai/react'
export function ChatInterface() {
const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
api: '/api/chat',
})
return (
<div>
<div>
{messages.map((m) => (
<div key={m.id}>
<strong>{m.role}:</strong> {m.content}
</div>
))}
</div>
<form onSubmit={handleSubmit}>
<input value={input} onChange={handleInputChange} disabled={isLoading} />
<button type="submit" disabled={isLoading}>Send</button>
</form>
</div>
)
}
Structured Output
For AI features that need to return structured data (not just prose), use generateObject:
import { openai } from '@ai-sdk/openai'
import { generateObject } from 'ai'
import { z } from 'zod'
const ProductSchema = z.object({
name: z.string(),
description: z.string().max(160),
tags: z.array(z.string()).max(5),
targetAudience: z.string(),
})
export async function generateProductListing(prompt: string) {
const { object } = await generateObject({
model: openai('gpt-4o'),
schema: ProductSchema,
prompt: `Generate a product listing for: ${prompt}`,
})
return object // TypeScript knows the exact shape
}
This is far more reliable than asking the model to return JSON and parsing it yourself.
Rate Limiting by User
Without rate limiting, a single user can exhaust your OpenAI budget in minutes. Use Upstash Redis for distributed rate limiting that works at the edge:
npm install @upstash/ratelimit @upstash/redis
// lib/rate-limit.ts
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'
export const ratelimit = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(10, '60 s'), // 10 requests per minute
analytics: true,
})
// app/api/chat/route.ts
import { ratelimit } from '@/lib/rate-limit'
import { headers } from 'next/headers'
export async function POST(request: Request) {
const headersList = await headers()
const ip = headersList.get('x-forwarded-for') ?? 'anonymous'
const { success, remaining } = await ratelimit.limit(ip)
if (!success) {
return new Response('Too many requests', {
status: 429,
headers: { 'X-RateLimit-Remaining': remaining.toString() },
})
}
// ... rest of handler
}
System Prompts and Context Injection
Keep system prompts in version-controlled files, not hard-coded strings:
// lib/prompts/support-agent.ts
export const SUPPORT_SYSTEM_PROMPT = `
You are a helpful support agent for Acme Corp.
- Only answer questions about our products
- If you don't know something, say so clearly
- Never make up pricing or feature information
- Escalate billing issues to: billing@acme.com
`.trim()
Inject user context into the system prompt server-side — the client never needs to send it:
export async function POST(request: Request) {
const { messages } = await request.json()
const user = await getCurrentUser(request)
const result = streamText({
model: openai('gpt-4o'),
system: `${SUPPORT_SYSTEM_PROMPT}\n\nCurrent user: ${user.name} (plan: ${user.plan})`,
messages,
})
return result.toDataStreamResponse()
}
Cost Control
OpenAI costs compound quickly without controls in place:
Set hard token limits:
streamText({
model: openai('gpt-4o-mini'), // cheaper for most use cases
messages,
maxTokens: 512,
})
Log usage for monitoring:
const result = await streamText({ ... })
const usage = await result.usage
await db.insert(aiUsageLogs).values({
userId: user.id,
promptTokens: usage.promptTokens,
completionTokens: usage.completionTokens,
model: 'gpt-4o-mini',
createdAt: new Date(),
})
Use gpt-4o-mini by default. It handles 90% of tasks at a fraction of the cost. Only escalate to gpt-4o when output quality visibly matters.
Edge vs Node Runtime
export const runtime = 'edge' // default for AI routes
Edge runtime benefits:
- Faster cold starts (no Node.js bootstrap)
- Deployed closer to users globally
- Streaming responses work perfectly
Edge runtime constraints:
- No Node.js native modules
- No filesystem access
- Some npm packages aren't edge-compatible (check the package docs)
If a dependency isn't edge-compatible, drop export const runtime = 'edge' and use the Node.js runtime instead.
Caching AI Responses
For deterministic queries (same input always produces the same output), cache the response:
import { unstable_cache } from 'next/cache'
export const generateTagsForPost = unstable_cache(
async (postContent: string) => {
const { object } = await generateObject({
model: openai('gpt-4o-mini'),
schema: z.object({ tags: z.array(z.string()).max(5) }),
prompt: `Generate 5 tags for this blog post: ${postContent}`,
})
return object.tags
},
['post-tags'],
{ revalidate: 86400 } // cache for 24 hours
)
AI features are table stakes for SaaS products in 2026. The patterns above — streaming, structured output, rate limiting, cost controls — are what separates a reliable product from an expensive prototype.
Continue reading
Related articles
Rate Limiting in Next.js: Protecting Your API Routes
How to implement production-grade rate limiting in Next.js — with Middleware-level protection, per-user limits, and distributed rate limiting using Upstash Redis.
EngineeringNext.js Parallel Routes and Intercepting Routes: A Complete Guide
Parallel routes and intercepting routes are among the most powerful App Router primitives. This guide explains what they do, when to use them, and how to avoid the common pitfalls.
EngineeringVercel vs Netlify vs AWS Amplify for Next.js in 2026
A practical comparison of the three most common Next.js hosting platforms — Vercel, Netlify, and AWS Amplify — with real cost and capability trade-offs.
Stay informed
Get our monthly deep dives.
Engineering, design, and growth insights — once a month. No spam.
Browse all resources