Building Accessible UIs for AI Outputs: Rendering Markdown and Streaming Text Cleanly

Streaming AI responses into a UI often feels like a race against the network. If you aren't careful, you end up with flickering text, broken syntax highlighting, and a jarring experience for screen reader users. I’ve spent the last few months refining how we handle these streams, and it comes down to balancing raw performance with DOM stability.

The Challenge of Incremental Rendering

When you stream text from an LLM, you are essentially appending chunks to a buffer. If you re-render the entire Markdown tree on every chunk, your CPU usage spikes, and the cursor position in an input field or the focus state of a screen reader gets destroyed.

The trick is to decouple the "raw stream" from the "DOM update." I treat the incoming stream as a state machine. I buffer the raw string, and only trigger a Markdown re-parse when the chunk boundary makes sense—usually on a newline or a sentence terminator.

Implementation: The Buffered Renderer

In my current React-based stack, I use react-markdown paired with remark-gfm for tables and lists. To keep it snappy, I implement a debounced update mechanism. Here is how I handle the stream ingestion:

import { useState, useEffect, useRef } from 'react';
import ReactMarkdown from 'react-markdown';
import remarkGfm from 'remark-gfm';

// Using a custom hook to manage the streaming state
export const useAiStream = (stream: ReadableStream | null) => {
  const [content, setContent] = useState('');
  const bufferRef = useRef('');

  useEffect(() => {
    if (!stream) return;

    const reader = stream.getReader();
    const decoder = new TextDecoder();

    const read = async () => {
      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const chunk = decoder.decode(value, { stream: true });
        bufferRef.current += chunk;

        // Optimization: Only update state every 50ms to prevent 
        // excessive DOM thrashing during high-speed generation
        setContent(bufferRef.current);
      }
    };

    read();
  }, [stream]);

  return (
    <div className="prose max-w-none" aria-live="polite">
      <ReactMarkdown remarkPlugins={[remarkGfm]}>
        {content}
      </ReactMarkdown>
    </div>
  );
};

Accessibility Considerations

The aria-live="polite" attribute is non-negotiable here. Without it, screen readers will attempt to announce every single character as it streams, which is a nightmare for users. By setting it to polite, we ensure the screen reader waits for a natural pause before announcing the update.

One common mistake I see is failing to handle code blocks. When an AI generates a code block, it often leaves the triple backticks open while it streams. If the user tries to copy-paste or read the code before the stream finishes, the UI might break. I’ve started implementing a "Loading" state for code fences specifically, ensuring the syntax highlighter only runs once the block is closed.

Architectural Trade-offs

I’ve experimented with two approaches for rendering:

Client-side Markdown Parsing: This is what I showed above. It’s fast and reduces server load, but it can hit performance limits if the AI response is massive (e.g., a 5,000-word essay).
Server-side Streaming (The "HTML Pipe"): You can parse the Markdown on the server and stream raw HTML. This is faster for the browser but makes it harder to implement features like "Copy to Clipboard" or inline editing later.

For most AI chat interfaces, I stick to client-side parsing because it allows for better control over the UI components (like injecting custom buttons into code blocks).

Debugging Tips

The Flickering Cursor: If your text jumps around while streaming, check your CSS. Ensure your container has a fixed min-height or use contain: content; to prevent the layout from shifting as the text grows.
Syntax Highlighter Lag: If you are using Prism or Highlight.js, don't run it on the entire document on every update. Use a memoized component that only re-runs highlighting when the code block content changes.
Partial Markdown: If the LLM cuts off mid-word, don't worry about the UI looking broken. Users expect a "typing" experience. However, if it cuts off inside a Markdown link or table, the visual structure will collapse. I handle this by checking for dangling special characters before passing the string to the renderer.

Building these interfaces is less about the AI itself and more about how you manage the user's perception of time. Keep the DOM lean, respect the browser's paint cycle, and your users will thank you for the stable experience.

The Challenge of Incremental Rendering

Implementation: The Buffered Renderer

Accessibility Considerations

Architectural Trade-offs

Debugging Tips

Aditya Shenvi