All insights
Engineering10 min read

Integrating OpenAI with Next.js: Streaming, Edge, and Production Patterns

A production guide to building AI-powered features in Next.js — streaming text generation, edge deployment, rate limiting, and cost control.

The Core Pattern

Every AI feature in a web application boils down to: send a prompt to an LLM, stream the response back to the user. Next.js App Router's Route Handlers and the Vercel AI SDK make this straightforward to implement correctly.

Setup

code
npm install ai openai
code
OPENAI_API_KEY=sk-...

Basic Streaming Response

Use the Vercel AI SDK for streaming — it handles the server-sent event protocol and React client hooks automatically.

code
// app/api/chat/route.ts
import { openai } from '@ai-sdk/openai'
import { streamText } from 'ai'

export const runtime = 'edge'

export async function POST(request: Request) {
  const { messages } = await request.json()

  const result = streamText({
    model: openai('gpt-4o'),
    messages,
    maxTokens: 1024,
  })

  return result.toDataStreamResponse()
}

On the client, use the useChat hook:

code
'use client'
import { useChat } from 'ai/react'

export function ChatInterface() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: '/api/chat',
  })

  return (
    <div>
      <div>
        {messages.map((m) => (
          <div key={m.id}>
            <strong>{m.role}:</strong> {m.content}
          </div>
        ))}
      </div>
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} disabled={isLoading} />
        <button type="submit" disabled={isLoading}>Send</button>
      </form>
    </div>
  )
}

Structured Output

For AI features that need to return structured data (not just prose), use generateObject:

code
import { openai } from '@ai-sdk/openai'
import { generateObject } from 'ai'
import { z } from 'zod'

const ProductSchema = z.object({
  name: z.string(),
  description: z.string().max(160),
  tags: z.array(z.string()).max(5),
  targetAudience: z.string(),
})

export async function generateProductListing(prompt: string) {
  const { object } = await generateObject({
    model: openai('gpt-4o'),
    schema: ProductSchema,
    prompt: `Generate a product listing for: ${prompt}`,
  })

  return object // TypeScript knows the exact shape
}

This is far more reliable than asking the model to return JSON and parsing it yourself.

Rate Limiting by User

Without rate limiting, a single user can exhaust your OpenAI budget in minutes. Use Upstash Redis for distributed rate limiting that works at the edge:

code
npm install @upstash/ratelimit @upstash/redis
code
// lib/rate-limit.ts
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'

export const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(10, '60 s'), // 10 requests per minute
  analytics: true,
})
code
// app/api/chat/route.ts
import { ratelimit } from '@/lib/rate-limit'
import { headers } from 'next/headers'

export async function POST(request: Request) {
  const headersList = await headers()
  const ip = headersList.get('x-forwarded-for') ?? 'anonymous'

  const { success, remaining } = await ratelimit.limit(ip)

  if (!success) {
    return new Response('Too many requests', {
      status: 429,
      headers: { 'X-RateLimit-Remaining': remaining.toString() },
    })
  }

  // ... rest of handler
}

System Prompts and Context Injection

Keep system prompts in version-controlled files, not hard-coded strings:

code
// lib/prompts/support-agent.ts
export const SUPPORT_SYSTEM_PROMPT = `
You are a helpful support agent for Acme Corp.
- Only answer questions about our products
- If you don't know something, say so clearly
- Never make up pricing or feature information
- Escalate billing issues to: billing@acme.com
`.trim()

Inject user context into the system prompt server-side — the client never needs to send it:

code
export async function POST(request: Request) {
  const { messages } = await request.json()
  const user = await getCurrentUser(request)

  const result = streamText({
    model: openai('gpt-4o'),
    system: `${SUPPORT_SYSTEM_PROMPT}\n\nCurrent user: ${user.name} (plan: ${user.plan})`,
    messages,
  })

  return result.toDataStreamResponse()
}

Cost Control

OpenAI costs compound quickly without controls in place:

Set hard token limits:

code
streamText({
  model: openai('gpt-4o-mini'), // cheaper for most use cases
  messages,
  maxTokens: 512,
})

Log usage for monitoring:

code
const result = await streamText({ ... })
const usage = await result.usage

await db.insert(aiUsageLogs).values({
  userId: user.id,
  promptTokens: usage.promptTokens,
  completionTokens: usage.completionTokens,
  model: 'gpt-4o-mini',
  createdAt: new Date(),
})

Use gpt-4o-mini by default. It handles 90% of tasks at a fraction of the cost. Only escalate to gpt-4o when output quality visibly matters.

Edge vs Node Runtime

code
export const runtime = 'edge' // default for AI routes

Edge runtime benefits:

  • Faster cold starts (no Node.js bootstrap)
  • Deployed closer to users globally
  • Streaming responses work perfectly

Edge runtime constraints:

  • No Node.js native modules
  • No filesystem access
  • Some npm packages aren't edge-compatible (check the package docs)

If a dependency isn't edge-compatible, drop export const runtime = 'edge' and use the Node.js runtime instead.

Caching AI Responses

For deterministic queries (same input always produces the same output), cache the response:

code
import { unstable_cache } from 'next/cache'

export const generateTagsForPost = unstable_cache(
  async (postContent: string) => {
    const { object } = await generateObject({
      model: openai('gpt-4o-mini'),
      schema: z.object({ tags: z.array(z.string()).max(5) }),
      prompt: `Generate 5 tags for this blog post: ${postContent}`,
    })
    return object.tags
  },
  ['post-tags'],
  { revalidate: 86400 } // cache for 24 hours
)

AI features are table stakes for SaaS products in 2026. The patterns above — streaming, structured output, rate limiting, cost controls — are what separates a reliable product from an expensive prototype.

Stay informed

Get our monthly deep dives.

Engineering, design, and growth insights — once a month. No spam.

Browse all resources