本网关同时兼容 OpenAI / Anthropic / Gemini / OpenAI Responses 4 套协议。已有客户端 base_url 改一下、key 改一下即可使用。
1. 注册 + 创建 API Key
访问 注册,登录后到 用户中心 创建 usr- 开头的 key。
2. 选择协议
OpenAI 协议
最常用,gpt / claude / gemini 都能调
base_url=https://llm-api.carizon.work/v1
key=usr-...
Anthropic 协议
Claude SDK / Claude Code 用这个
base_url=https://llm-api.carizon.work
x-api-key: usr-...
Gemini 协议
Google AI SDK 用这个
base_url=https://llm-api.carizon.work/v1beta
?key=usr-...
3. 第一个调用
复制下面命令,把 $YOUR_KEY 换成你的 key:
curl https://llm-api.carizon.work/v1/chat/completions \
-H "Authorization: Bearer $YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-5.5","messages":[{"role":"user","content":"hi"}]}'
from openai import OpenAI
client = OpenAI(api_key="$YOUR_KEY",
base_url="https://llm-api.carizon.work/v1")
r = client.chat.completions.create(
model="gpt-5.5",
messages=[{"role":"user","content":"hi"}]
)
print(r.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "$YOUR_KEY",
baseURL: "https://llm-api.carizon.work/v1",
});
const r = await client.chat.completions.create({
model: "gpt-5.5",
messages: [{role:"user",content:"hi"}],
});
console.log(r.choices[0].message.content);
💡 想可视化?去
模型广场 看完整定价、按厂商筛选。
HTTP 状态码
| HTTP | code | 说明 |
| 401 | missing_api_key / invalid_api_key | Bearer header 缺失或 key 不存在 / 被吊销 |
| 403 | account_suspended | 账户被管理员暂停。联系支持。 |
| 404 | not_found | 路径不对,确认是 /v1/chat/completions 等 |
| 429 | user_monthly_cap | 本月 $20 配额用完,月初 UTC 重置 |
| 429 | rate_limit | 请求 QPS 超限,几秒后重试 |
| 500 | upstream_error | 上游模型服务异常 |
| 503 | service_unavailable | 所有上游账号配额耗尽,几分钟后再试 |
月度限额
每个用户独立 $20 USD / 月 总配额(跨所有 key 累计):
- 实际成本按 token:输入 × 输入价 + 输出 × 输出价(见 模型广场)
- 每月 1 号 00:00 UTC 自动重置
- 用满后返回
429 user_monthly_cap — 不会无声超支
- 到 /portal 查实时用量 + 按模型 / 按 key 分布
需要更高额度?联系管理员调整 cap(admin 可在用户管理里改)。
/v1/chat/completions
OpenAI 标准聊天接口。支持 gpt-* / claude-* / gemini-* / grok-* / DeepSeek 等所有模型。
请求参数
| param | 类型 | 说明 |
| model必填 | string | 模型名(参考 /models 页面) |
| messages必填 | array | [{role, content}] 数组 |
| stream | bool | 是否 SSE 流式返回,默认 false |
| temperature | number | 0–2,默认 1 |
| max_tokens | number | 响应最大 token 数 |
| tools | array | function calling 定义 |
| tool_choice | string|object | "auto" / "none" / 指定 tool |
| response_format | object | JSON mode: {"type":"json_object"} |
示例
curl https://llm-api.carizon.work/v1/chat/completions \
-X POST \
-H "Authorization: Bearer $YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-6",
"messages": [
{"role":"system","content":"You are helpful."},
{"role":"user","content":"Why is the sky blue?"}
],
"temperature": 0.7,
"max_tokens": 1024,
"stream": false
}'
from openai import OpenAI
client = OpenAI(api_key="$YOUR_KEY",
base_url="https://llm-api.carizon.work/v1")
r = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[
{"role":"system","content":"You are helpful."},
{"role":"user","content":"Why is the sky blue?"},
],
temperature=0.7,
max_tokens=1024,
)
print(r.choices[0].message.content)
print(r.usage)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "$YOUR_KEY",
baseURL: "https://llm-api.carizon.work/v1",
});
const r = await client.chat.completions.create({
model: "claude-sonnet-4-6",
messages: [
{role:"system", content:"You are helpful."},
{role:"user", content:"Why is the sky blue?"},
],
temperature: 0.7,
max_tokens: 1024,
});
console.log(r.choices[0].message.content);
/v1/responses
OpenAI 新一代 API,专为 gpt-5-pro、o1、codex 等推理模型设计,原生支持 thinking budget 与 tool calling。
请求参数
| param | 类型 | 说明 |
| model必填 | string | gpt-5.4-pro / gpt-5.3-codex 等 |
| input必填 | string|array | 输入文本或多模态数组 |
| reasoning | object | {"effort":"low|medium|high"} |
| stream | bool | SSE 流式 |
示例
curl https://llm-api.carizon.work/v1/responses \
-X POST \
-H "Authorization: Bearer $YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.4-pro",
"input": "Solve: 2x + 3 = 11",
"reasoning": {"effort":"medium"}
}'
from openai import OpenAI
client = OpenAI(api_key="$YOUR_KEY",
base_url="https://llm-api.carizon.work/v1")
r = client.responses.create(
model="gpt-5.4-pro",
input="Solve: 2x + 3 = 11",
reasoning={"effort":"medium"},
)
print(r.output_text)
/v1/embeddings
文本向量化。后端按 model 自动路由到 Azure OpenAI 或 Azure AI Foundry,对调用方完全 OpenAI-API 兼容。
支持的模型
| model | 上游 | 维度 | 价格 / 1M tokens |
text-embedding-3-large | Azure OpenAI | 3072 (可降到 256/512/1024) | $0.13 |
text-embedding-3-small | Azure OpenAI | 1536 (可降) | $0.02 |
embed-v-4-0 | Azure AI Foundry (Cohere) | 1536 (默认) | $0.12 |
请求参数
| param | 类型 | 说明 |
| model必填 | string | 见上表 |
| input必填 | string|array | 要向量化的文本或文本数组 |
| dimensions | number | 向量维度(仅 text-embedding-3 支持) |
示例
curl https://llm-api.carizon.work/v1/embeddings \
-X POST \
-H "Authorization: Bearer $YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-large",
"input": "The quick brown fox"
}'
from openai import OpenAI
client = OpenAI(api_key="$YOUR_KEY",
base_url="https://llm-api.carizon.work/v1")
r = client.embeddings.create(
model="text-embedding-3-large",
input=["The quick brown fox", "Hello world"],
)
for e in r.data:
print(len(e.embedding), e.embedding[:5])
/v1/chat/completions (model = gpt-audio-1.5)
带 音频输入/输出 的对话接口。基于 OpenAI 的 audio-capable chat-completions 协议(modalities 字段决定输入/输出走音频还是文本)。模型纯音频,纯文本请求会被 400 拒收。
请求参数(与 chat/completions 一致,新增)
| param | 类型 | 说明 |
| model必填 | string | gpt-audio-1.5 |
| modalities必填 | array | ["text", "audio"] 或 ["audio"] 任一 |
| audio | object | {"voice":"alloy","format":"wav"} |
| messages必填 | array | content 里包 {"type":"input_audio","input_audio":{"data":"<base64>","format":"wav"}} |
示例
curl https://llm-api.carizon.work/v1/chat/completions \
-X POST -H "Authorization: Bearer $YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-audio-1.5",
"modalities": ["text", "audio"],
"audio": {"voice": "alloy", "format": "wav"},
"messages": [
{"role":"user","content":[{"type":"input_audio","input_audio":{"data":"","format":"wav"}}]}
]
}'
import base64, openai
client = openai.OpenAI(api_key="$YOUR_KEY", base_url="https://llm-api.carizon.work/v1")
audio_b64 = base64.b64encode(open("hello.wav","rb").read()).decode()
r = client.chat.completions.create(
model="gpt-audio-1.5",
modalities=["text", "audio"],
audio={"voice":"alloy","format":"wav"},
messages=[{"role":"user","content":[
{"type":"input_audio","input_audio":{"data":audio_b64,"format":"wav"}}
]}],
)
print(r.choices[0].message.audio.transcript)
open("out.wav","wb").write(base64.b64decode(r.choices[0].message.audio.data))
/v1/azure-speech/speechtotext/transcriptions:transcribe?api-version=2024-11-15
语音转文字 + 语音翻译,走 Azure Speech Fast Transcription(不是 OpenAI Whisper)。我们透传 Azure 的原生 multipart 接口(不做 OpenAI 协议翻译),因此请按 Azure SDK 示例发请求。
可用模型
gpt-realtime-whisper — 语音转文字(保留原语言)
gpt-realtime-translate — 语音翻译成英文
multipart body
| part | 说明 |
| audio | 音频文件(wav/mp3/m4a 等,最大 100MB) |
| definition | JSON:{"locales":["en-US"],"model":"gpt-realtime-whisper"} |
示例(curl)
curl https://llm-api.carizon.work/v1/azure-speech/speechtotext/transcriptions:transcribe?api-version=2024-11-15 \
-X POST -H "Authorization: Bearer $YOUR_KEY" \
-F "audio=@hello.wav" \
-F 'definition={"locales":["en-US"],"model":"gpt-realtime-whisper"}'
# response: {"durationMilliseconds":..., "combinedPhrases":[{"text":"..."}], "phrases":[...]}
wss://llm-api.carizon.work/v1/realtime?model=gpt-realtime-2
WebSocket 接入 Azure OpenAI Realtime API,用于实时语音对话(VAD、流式音频 in/out、function calling)。鉴权用 Authorization: Bearer 或查询参数 ?api-key= / ?key=(浏览器场景)。
连接参数
| param | 说明 |
| model | 当前唯一支持 gpt-realtime-2 |
| voice | 可选;alloy / echo / shimmer 等 |
事件协议
完全遵循 OpenAI Realtime API 的 JSON 事件结构:客户端可发 session.update / input_audio_buffer.append / response.create 等;服务端推送 session.created / response.audio.delta / response.done 等。
示例(Node + ws)
// npm i ws
import { WebSocket } from "ws";
const ws = new WebSocket(
"wss://llm-api.carizon.work/v1/realtime?model=gpt-realtime-2",
{ headers: { Authorization: `Bearer ${process.env.YOUR_KEY}` } }
);
ws.on("open", () => {
// 配置 session
ws.send(JSON.stringify({
type: "session.update",
session: { instructions: "You are a helpful Mandarin tutor." }
}));
});
ws.on("message", (d) => console.log(JSON.parse(d.toString())));
# pip install websockets
import asyncio, json, os, websockets
async def main():
uri = "wss://llm-api.carizon.work/v1/realtime?model=gpt-realtime-2"
headers = {"Authorization": f"Bearer {os.environ['YOUR_KEY']}"}
async with websockets.connect(uri, additional_headers=headers) as ws:
await ws.send(json.dumps({"type":"session.update",
"session":{"instructions":"Be concise."}}))
async for msg in ws:
print(json.loads(msg))
asyncio.run(main())
计费
按上游音频 token 计费(输入 $40 / 输出 $80 每 1M token,约 1 美元/分钟密集对话)。会话起止与上下行字节数记入 traffic_realtime_sessions 表,/admin#traffic 可查。
/v1/images/generations
图像生成。支持 gpt-image-2、grok-imagine-image 等。
请求参数
| param | 类型 | 说明 |
| model必填 | string | gpt-image-2 / grok-imagine-image |
| prompt必填 | string | 图像描述 |
| size | string | 1024x1024 / 1792x1024 |
| quality | string | standard / hd |
| n | number | 生成图片张数 (默认 1) |
示例
curl https://llm-api.carizon.work/v1/images/generations \
-X POST \
-H "Authorization: Bearer $YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-image-2",
"prompt": "A serene Japanese garden in spring",
"size": "1024x1024",
"n": 1
}'
from openai import OpenAI
client = OpenAI(api_key="$YOUR_KEY",
base_url="https://llm-api.carizon.work/v1")
r = client.images.generate(
model="gpt-image-2",
prompt="A serene Japanese garden in spring",
size="1024x1024",
)
print(r.data[0].url)
/v1/models
返回当前可用的所有模型列表(OpenAI 标准格式)。
示例
curl https://llm-api.carizon.work/v1/models \
-H "Authorization: Bearer $YOUR_KEY"
from openai import OpenAI
client = OpenAI(api_key="$YOUR_KEY",
base_url="https://llm-api.carizon.work/v1")
for m in client.models.list().data:
print(m.id)
/v1/messages
Anthropic 原生协议。Claude SDK / Claude Code 用这个。支持 thinking、tools、web_search、code_execution、files 等高级特性。
请求参数
| param | 类型 | 说明 |
| model必填 | string | claude-sonnet-4-6 / claude-opus-4-7 等 |
| messages必填 | array | 对话消息数组 |
| max_tokens必填 | number | 必填,最大输出 token 数 |
| system | string | 系统提示词 |
| thinking | object | {"type":"adaptive","budget_tokens":N} |
| tools | array | 含原生 web_search_20250305 / code_execution_20250825 |
| stream | bool | SSE 流式 |
认证 header
用 x-api-key: usr-... 或 Authorization: Bearer usr-... 都可。原生 Anthropic SDK 自动用 x-api-key。
示例
curl https://llm-api.carizon.work/v1/messages \
-X POST \
-H "x-api-key: $YOUR_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-6",
"max_tokens": 1024,
"messages": [{"role":"user","content":"Hello!"}]
}'
import anthropic
client = anthropic.Anthropic(
api_key="$YOUR_KEY",
base_url="https://llm-api.carizon.work",
)
msg = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role":"user","content":"Hello!"}],
)
print(msg.content[0].text)
/v1beta/models/{model}:generateContent
Gemini 原生协议。Google AI SDK 用这个。{model} 替换为模型名(如 gemini-2.5-pro)。
请求参数
| param | 类型 | 说明 |
| contents必填 | array | [{role,parts:[{text}]}] |
| generationConfig | object | temperature / topP / maxOutputTokens 等 |
| systemInstruction | object | 系统提示 |
| tools | array | function declarations |
认证
Gemini 用 query string:?key=usr-xxxxxxxx
示例
curl "https://llm-api.carizon.work/v1beta/models/gemini-2.5-pro:generateContent?key=$YOUR_KEY" \
-X POST \
-H "Content-Type: application/json" \
-d '{
"contents": [{
"role": "user",
"parts": [{"text": "Hello!"}]
}]
}'
import google.generativeai as genai
genai.configure(
api_key="$YOUR_KEY",
transport="rest",
client_options={"api_endpoint": "https://llm-api.carizon.work"},
)
model = genai.GenerativeModel("gemini-2.5-pro")
r = model.generate_content("Hello!")
print(r.text)
所有协议都支持 streaming:
- OpenAI:请求体加
"stream": true
- Anthropic:同
"stream": true
- Gemini:endpoint 改成
:streamGenerateContent?alt=sse
响应是 text/event-stream,每条事件 data: {...},结束 data: [DONE]。
OpenAI 流式 chunk 结构
data: {"choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"choices":[{"delta":{"content":", "},"index":0}]}
data: {"choices":[{"delta":{"content":"world"},"index":0}]}
data: {"choices":[{"delta":{},"finish_reason":"stop","index":0}]}
data: [DONE]
r = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role":"user","content":"hi"}],
stream=True,
)
for chunk in r:
delta = chunk.choices[0].delta.content
if delta: print(delta, end="", flush=True)
所有支持 vision 的模型都接受 OpenAI 标准的 image_url content:
curl https://llm-api.carizon.work/v1/chat/completions \
-X POST \
-H "Authorization: Bearer $YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{
"role": "user",
"content": [
{"type":"text","text":"这张图是什么?"},
{"type":"image_url","image_url":{"url":"https://example.com/cat.jpg"}}
]
}]
}'
r = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role":"user",
"content":[
{"type":"text","text":"这张图是什么?"},
{"type":"image_url","image_url":{
"url":"https://example.com/cat.jpg"}},
],
}],
)
print(r.choices[0].message.content)
支持 Claude、GPT-4o/5.x、Gemini、Grok-4.20 等。data: URL 和 https://... URL 都支持。
Claude / GPT-5-pro / Grok-reasoning 等支持显式推理:
Anthropic — adaptive thinking
curl https://llm-api.carizon.work/v1/messages \
-X POST \
-H "x-api-key: $YOUR_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-opus-4-7",
"max_tokens": 4096,
"thinking": {"type":"adaptive","budget_tokens":8000},
"messages": [{"role":"user","content":"Prove sqrt(2) is irrational"}]
}'
OpenAI Responses — effort
curl https://llm-api.carizon.work/v1/responses \
-X POST \
-H "Authorization: Bearer $YOUR_KEY" \
-d '{
"model": "gpt-5.4-pro",
"input": "Prove sqrt(2) is irrational",
"reasoning": {"effort":"high"}
}'
Q: key 泄露了怎么办?
到 portal → API Keys → 找到那个 key → 吊销。立刻失效,可再创建新的。
Q: 怎么查实时用量?
登录 portal,首屏显示本月用量 / cap。明细按模型 / 时间分布在 用量 标签。
Q: 数据安全 — prompt 会被记录吗?
请求体不持久化磁盘。仅 token 数 + 模型 + 时间用于计费。响应不存。
Q: 不在列表里的模型可以加吗?
联系管理员加 upstream provider(cliproxyapi 支持几乎所有主流 OAuth/API 模型)。
Q: 支持 batch API 吗?
暂不支持 OpenAI batch endpoint。如有需要联系管理员。
Q: 支持 function calling 吗?
支持。OpenAI 标准 tools 字段在所有模型都生效,详见 Tool / Function Calling。
Q: 国内能直接调吗?
本网关部署在 Azure Japan East,国内直连或加速都可。如果有网络问题用国内 VPS 中转一次。
Q: 联系方式
飞书 → 找运维 / 管理员。问题请提供 (1) 你的 email (2) 调用时间 (3) HTTP code + 响应体。