Engineering10 min read

Integrating OpenAI with Next.js: Streaming, Edge, and Production Patterns

A production guide to building AI-powered features in Next.js — streaming text generation, edge deployment, rate limiting, and cost control.

Vladan Ilic

Founder & Lead Engineer

2026-05-04

The Core Pattern

Every AI feature in a web application boils down to: send a prompt to an LLM, stream the response back to the user. Next.js App Router's Route Handlers and the Vercel AI SDK make this straightforward to implement correctly.

Setup

code

npm install ai openai

code

OPENAI_API_KEY=sk-...

Basic Streaming Response

Use the Vercel AI SDK for streaming — it handles the server-sent event protocol and React client hooks automatically.

code

// app/api/chat/route.ts
import { openai } from '@ai-sdk/openai'
import { streamText } from 'ai'

export const runtime = 'edge'

export async function POST(request: Request) {
  const { messages } = await request.json()

  const result = streamText({
    model: openai('gpt-4o'),
    messages,
    maxTokens: 1024,
  })

  return result.toDataStreamResponse()
}

On the client, use the useChat hook:

code

'use client'
import { useChat } from 'ai/react'

export function ChatInterface() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: '/api/chat',
  })

  return (
    <div>
      <div>
        {messages.map((m) => (
          <div key={m.id}>
            <strong>{m.role}:</strong> {m.content}
          </div>
        ))}
      </div>
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} disabled={isLoading} />
        <button type="submit" disabled={isLoading}>Send</button>
      </form>
    </div>
  )
}

Structured Output

For AI features that need to return structured data (not just prose), use generateObject:

code

import { openai } from '@ai-sdk/openai'
import { generateObject } from 'ai'
import { z } from 'zod'

const ProductSchema = z.object({
  name: z.string(),
  description: z.string().max(160),
  tags: z.array(z.string()).max(5),
  targetAudience: z.string(),
})

export async function generateProductListing(prompt: string) {
  const { object } = await generateObject({
    model: openai('gpt-4o'),
    schema: ProductSchema,
    prompt: `Generate a product listing for: ${prompt}`,
  })

  return object // TypeScript knows the exact shape
}

This is far more reliable than asking the model to return JSON and parsing it yourself.

Rate Limiting by User

Without rate limiting, a single user can exhaust your OpenAI budget in minutes. Use Upstash Redis for distributed rate limiting that works at the edge:

code

npm install @upstash/ratelimit @upstash/redis

code

// lib/rate-limit.ts
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'

export const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(10, '60 s'), // 10 requests per minute
  analytics: true,
})

code

// app/api/chat/route.ts
import { ratelimit } from '@/lib/rate-limit'
import { headers } from 'next/headers'

export async function POST(request: Request) {
  const headersList = await headers()
  const ip = headersList.get('x-forwarded-for') ?? 'anonymous'

  const { success, remaining } = await ratelimit.limit(ip)

  if (!success) {
    return new Response('Too many requests', {
      status: 429,
      headers: { 'X-RateLimit-Remaining': remaining.toString() },
    })
  }

  // ... rest of handler
}

System Prompts and Context Injection

Keep system prompts in version-controlled files, not hard-coded strings:

code

// lib/prompts/support-agent.ts
export const SUPPORT_SYSTEM_PROMPT = `
You are a helpful support agent for Acme Corp.
- Only answer questions about our products
- If you don't know something, say so clearly
- Never make up pricing or feature information
- Escalate billing issues to: billing@acme.com
`.trim()

Inject user context into the system prompt server-side — the client never needs to send it:

code

export async function POST(request: Request) {
  const { messages } = await request.json()
  const user = await getCurrentUser(request)

  const result = streamText({
    model: openai('gpt-4o'),
    system: `${SUPPORT_SYSTEM_PROMPT}\n\nCurrent user: ${user.name} (plan: ${user.plan})`,
    messages,
  })

  return result.toDataStreamResponse()
}

Cost Control

OpenAI costs compound quickly without controls in place:

Set hard token limits:

code

streamText({
  model: openai('gpt-4o-mini'), // cheaper for most use cases
  messages,
  maxTokens: 512,
})

Log usage for monitoring:

code

const result = await streamText({ ... })
const usage = await result.usage

await db.insert(aiUsageLogs).values({
  userId: user.id,
  promptTokens: usage.promptTokens,
  completionTokens: usage.completionTokens,
  model: 'gpt-4o-mini',
  createdAt: new Date(),
})

Use gpt-4o-mini by default. It handles 90% of tasks at a fraction of the cost. Only escalate to gpt-4o when output quality visibly matters.

Edge vs Node Runtime

code

export const runtime = 'edge' // default for AI routes

Edge runtime benefits:

Faster cold starts (no Node.js bootstrap)
Deployed closer to users globally
Streaming responses work perfectly

Edge runtime constraints:

No Node.js native modules
No filesystem access
Some npm packages aren't edge-compatible (check the package docs)

If a dependency isn't edge-compatible, drop export const runtime = 'edge' and use the Node.js runtime instead.

Caching AI Responses

For deterministic queries (same input always produces the same output), cache the response:

code

import { unstable_cache } from 'next/cache'

export const generateTagsForPost = unstable_cache(
  async (postContent: string) => {
    const { object } = await generateObject({
      model: openai('gpt-4o-mini'),
      schema: z.object({ tags: z.array(z.string()).max(5) }),
      prompt: `Generate 5 tags for this blog post: ${postContent}`,
    })
    return object.tags
  },
  ['post-tags'],
  { revalidate: 86400 } // cache for 24 hours
)

AI features are table stakes for SaaS products in 2026. The patterns above — streaming, structured output, rate limiting, cost controls — are what separates a reliable product from an expensive prototype.

Our services

Work with us on this

Web Development

High-performance, scalable web applications built with modern architectures like Next.js.

Learn more →

Product Discovery

Comprehensive research, business analysis, and roadmap planning to define your next digital product.

Learn more →

Engineering

Get our monthly deep dives.

Engineering, design, and growth insights — once a month. No spam.

Browse all resources

Integrating OpenAI with Next.js: Streaming, Edge, and Production Patterns

The Core Pattern

Setup

Basic Streaming Response

Structured Output

Rate Limiting by User

System Prompts and Context Injection

Cost Control

Edge vs Node Runtime

Caching AI Responses

Work with us on this

Web Development

Product Discovery

Related articles

Rate Limiting in Next.js: Protecting Your API Routes

Next.js Parallel Routes and Intercepting Routes: A Complete Guide

Vercel vs Netlify vs AWS Amplify for Next.js in 2026

Get our monthly deep dives.