Streaming LLM Responses in Next.js 16 App Router using React Server Components

Waiting for a large language model to generate a full response before showing anything to the user feels like 2023. In modern applications, perceived latency is everything. If you’re building with Next.js 16 and the App Router, you shouldn't be buffering LLM outputs; you should be streaming them directly to the client.

I recently refactored a RAG-based dashboard to handle streaming, and the difference in user experience is night and day. Here is how I set it up using the ai SDK and React Server Components.

The Architectural Shift

When we stream LLM responses, we move away from standard JSON POST requests that wait for a complete payload. Instead, we use a ReadableStream. In the context of Next.js 16, we leverage the AI SDK’s streamText function, which handles the heavy lifting of interfacing with providers like OpenAI or Anthropic while maintaining a clean handshake with the browser.

The key trade-off here is state management. Since we are streaming from a Server Component or a Route Handler, we lose the ability to easily "edit" the message history once it's sent. You have to handle the UI state on the client side as the chunks arrive.

Implementation: The Route Handler

I prefer using a dedicated Route Handler for the stream. It keeps the Server Component clean and allows for better error handling if the stream drops mid-request.

// app/api/chat/route.ts
import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';

export async function POST(req: Request) {
  const { messages } = await req.json();

  // We use the 'gpt-4o' model here. 
  // Ensure your OPENAI_API_KEY is defined in your .env.local
  const result = await streamText({
    model: openai('gpt-4o'),
    messages,
    // Add system prompts here to define behavior
    system: 'You are a helpful coding assistant.',
  });

  // The AI SDK converts the response into a stream format
  // that the 'ai/react' hooks can parse on the frontend.
  return result.toDataStreamResponse();
}

Consuming the Stream on the Client

On the frontend, the useChat hook from the Vercel AI SDK is the standard for a reason. It manages the message array and the streaming state automatically.

// app/chat/page.tsx
'use client';

import { useChat } from 'ai/react';

export default function ChatPage() {
  const { messages, input, handleInputChange, handleSubmit } = useChat();

  return (
    <div className="max-w-xl mx-auto py-8">
      {messages.map(m => (
        <div key={m.id} className="mb-4">
          <span className="font-bold">{m.role === 'user' ? 'You: ' : 'AI: '}</span>
          {m.content}
        </div>
      ))}

      <form onSubmit={handleSubmit} className="fixed bottom-4 w-full max-w-xl">
        <input
          className="w-full border p-2 rounded"
          value={input}
          onChange={handleInputChange}
          placeholder="Ask me anything..."
        />
      </form>
    </div>
  );
}

Debugging and Operational Tips

1. The "Empty Response" Trap

If you see the network request succeed (200 OK) but nothing renders, check your DataStream format. If you aren't using the ai SDK's toDataStreamResponse(), you have to manually handle the TextEncoder and ReadableStream logic. Stick to the SDK helpers unless you have a very specific, non-standard streaming requirement.

2. Timeouts and Vercel/Edge Constraints

If you’re hosting on Vercel, keep in mind that hobby plans have execution time limits. Streaming helps because you aren't waiting for the entire response, but if your LLM call takes 60 seconds to generate a massive report, the connection might still be throttled. Use maxDuration in your route config to extend this:

// app/api/chat/route.ts
export const maxDuration = 60; // Set to 60 seconds for complex tasks

3. Markdown Rendering

Streaming raw text from an LLM often includes Markdown. Don't try to parse it manually. Integrate react-markdown to handle the incoming chunks. Because the stream updates the messages state in React, the component will re-render as new tokens arrive, and the markdown library will update the DOM incrementally.

4. Handling Interrupted Streams

Users often navigate away from a page while the AI is mid-sentence. Ensure your cleanup logic in useEffect or your component unmounting lifecycle handles the termination of the stream if necessary, though the AI SDK hooks usually manage this well out of the box.

By offloading the stream processing to the SDK and managing the UI state with useChat, you get a robust, production-ready implementation that feels fast and responsive. It’s a clean pattern that scales as your app grows.