流式响应

实时响应流式传输，改善用户体验并降低感知延迟。

流式响应通过在内容生成时逐个 token 传递，为用户提供即时反馈，显著提升感知性能和用户体验。

概述

Requesty 支持来自所有主要提供商（OpenAI、Anthropic、Google、Mistral）的流式响应，使用服务器发送事件（SSE）。您的应用程序无需等待完整响应，即可在内容生成时显示内容。

为什么使用流式传输？

改善用户体验：用户可以立即看到响应，感知等待时间减少高达 80%
更好的参与度：实时内容传递在较长响应期间保持用户参与
减少超时：避免在慢速或复杂请求上出现超时问题
渐进式显示：在内容可用时启用渐进式 UI 更新

实现

基本流式设置

通过在请求中将 stream 参数设置为 true 来启用流式传输：

Python
JavaScript
Bash

import openai

client = openai.OpenAI(
api_key="your_requesty_api_key",
base_url="https://gw.1route.ai/v1",
)

response = client.chat.completions.create(
model="openai/gpt-4",
messages=[{"role": "user", "content": "Write a poem about the stars."}],
stream=True
)

# 处理流式响应

for chunk in response:
if chunk.choices[0].delta.content is not None:
content = chunk.choices[0].delta.content
print(content, end="", flush=True)

const { OpenAI } = require('openai');

const client = new OpenAI({
    apiKey: "your_requesty_api_key",
    baseURL: "https://gw.1route.ai/v1",
});

async function streamResponse() {
    const stream = await client.chat.completions.create({
        model: "openai/gpt-4",
        messages: [{ "role": "user", "content": "Write a poem about the stars." }],
        stream: true
    });

    for await (const chunk of stream) {
        if (chunk.choices[0].delta.content) {
            process.stdout.write(chunk.choices[0].delta.content);
        }
    }
}

streamResponse();

# 使用流式传输调用模型
API_KEY="your_requesty_api_key"

curl -N -X POST "https://gw.1route.ai/v1/chat/completions" \
 -H "Authorization: Bearer $API_KEY" \
 -H "Content-Type: application/json" \
 -d '{"model": "openai/gpt-4", "messages": [{"role": "user", "content": "Write a poem about the stars."}], "stream": true}'

概述​

为什么使用流式传输？​

实现​

基本流式设置​

概述

为什么使用流式传输？

实现

基本流式设置