Skip to main content

Streaming Responses

Stream responses in real time to improve user experience and reduce perceived latency.

With streaming enabled, tokens are delivered as they are generated, giving users immediate feedback and significantly improving perceived performance.

Overview

Requesty supports Server-Sent Events (SSE) streaming across all major providers (OpenAI, Anthropic, Google, Mistral). Your application can render content progressively while it is being generated without waiting for the full response.

Why Stream?

  • Better UX: Users see output immediately; perceived wait time can drop by up to 80%.
  • Higher engagement: Real-time delivery keeps users engaged during long responses.
  • Fewer timeouts: Avoid timeouts on slow or complex requests.
  • Progressive display: Enable incremental UI updates as chunks arrive.

Implementation

Basic Streaming Setup

Enable streaming by setting the stream parameter to true in your request:

import openai

client = openai.OpenAI(
api_key="your_requesty_api_key",
base_url="https://gw.1route.ai/v1",
)

response = client.chat.completions.create(
model="openai/gpt-4",
messages=[{"role": "user", "content": "Write a poem about the stars."}],
stream=True
)

# Handle streamed chunks
for chunk in response:
if chunk.choices[0].delta.content is not None:
content = chunk.choices[0].delta.content
print(content, end="", flush=True)