Grok Console 的 `stream=true` 目前不是上游真流式，建议接入真正的流式响应或明确降级

## 问题

当前 `/v1/chat/completions` 在使用 Grok Console 模型时，例如 `grok-4.3`、`grok-4`，即使调用方传入 `stream=true`，服务端也不是从 Grok Console 上游逐 token 转发。

现有逻辑大致是：

1. `stream_grok_chat_completion()` 判断不是 app-chat 模型后，调用 `grok.console_chat_completion()`。
2. `console_chat_completion()` 内部通过 `GrokConsoleClient.create_response(payload)` 等待上游完整响应。
3. 上游完整响应返回后，再一次性构造 OpenAI SSE chunk。

因此客户端虽然拿到的是 SSE 格式，但首个 token 需要等完整上游响应完成后才出现。对长回答、联网搜索、推理模型来说，这会表现为“流式请求仍长时间无输出”。

## 影响

- `stream=true` 不能降低首 token 延迟。
- 客户端无法实时显示 Grok Console 的生成过程。
- 长请求更容易被网关、代理或客户端误判为无响应。
- 这和 OpenAI-compatible 客户端对 `stream=true` 的预期不一致。

## 期望行为

- 如果 Grok Console 上游支持流式 Responses，应在 `GrokConsoleClient` 中实现真正的流式读取，并逐步转换为 OpenAI SSE chunk。
- 如果上游暂不支持或当前实现暂不准备支持，应在文档或响应中明确说明 Grok Console `stream=true` 是兼容格式，不是上游真流式。
- 更理想的行为是：`stream=true` 时首个 SSE 事件应在上游开始返回内容后尽快输出，而不是等待完整回答。

## 建议改动

1. 在 `GrokConsoleClient` 增加流式方法，例如：

```python
def stream_response(self, payload: dict[str, Any]) -> Iterator[dict[str, Any]]:
    payload = dict(payload)
    payload["stream"] = True
    ...
```

2. `stream_grok_chat_completion()` 对 Console 模型不要调用阻塞式 `console_chat_completion()`，而是调用新的流式方法：

```python
if not is_grok_app_chat_model(spec):
    yield from grok.console_chat_completion_events(body, spec, messages)
```

3. 将 Grok Console 流式事件转换为 OpenAI chat completion chunk：

- 普通文本映射到 `delta.content`。
- 推理文本如果存在，映射到 `delta.reasoning_content`。
- 错误事件转换为 SSE error 或最终错误响应。

4. 如果暂时无法实现真流式，建议至少在代码注释和文档中明确：当前 `stream=true` 只是响应格式兼容，实际仍等待完整上游响应。

## 建议测试

- `/v1/chat/completions` + `model=grok-4.3` + `stream=true` 时，首个 chunk 应早于完整回答结束返回。
- Console 普通文本 delta 能连续输出。
- Console reasoning delta 如果存在，应能连续输出或至少不丢失。
- 上游流式中断时，客户端能收到明确错误，而不是一直挂起。
- `stream=false` 的现有非流式行为不应回归。


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grok Console 的 `stream=true` 目前不是上游真流式，建议接入真正的流式响应或明确降级 #12

问题

影响

期望行为

建议改动

建议测试

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Grok Console 的 stream=true 目前不是上游真流式，建议接入真正的流式响应或明确降级 #12

Description

问题

影响

期望行为

建议改动

建议测试

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Grok Console 的 `stream=true` 目前不是上游真流式，建议接入真正的流式响应或明确降级 #12