📖

API 接入文档

OpenAI / Anthropic / Gemini / Responses
📮 使用中遇到问题、需要邀请码或申请扩容配额,请联系 Carizon IT

5 分钟接入

本网关同时兼容 OpenAI / Anthropic / Gemini / OpenAI Responses 4 套协议。已有客户端 base_url 改一下、key 改一下即可使用。

1. 注册 + 创建 API Key

访问 注册,登录后到 用户中心 创建 usr- 开头的 key。

2. 选择协议

OpenAI 协议

最常用,gpt / claude / gemini 都能调
base_url=https://llm-api.carizon.work/v1
key=usr-...

Anthropic 协议

Claude SDK / Claude Code 用这个
base_url=https://llm-api.carizon.work
x-api-key: usr-...

Gemini 协议

Google AI SDK 用这个
base_url=https://llm-api.carizon.work/v1beta
?key=usr-...

3. 第一个调用

复制下面命令,把 $YOUR_KEY 换成你的 key:

curl https://llm-api.carizon.work/v1/chat/completions \ -H "Authorization: Bearer $YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"gpt-5.5","messages":[{"role":"user","content":"hi"}]}'
from openai import OpenAI client = OpenAI(api_key="$YOUR_KEY", base_url="https://llm-api.carizon.work/v1") r = client.chat.completions.create( model="gpt-5.5", messages=[{"role":"user","content":"hi"}] ) print(r.choices[0].message.content)
import OpenAI from "openai"; const client = new OpenAI({ apiKey: "$YOUR_KEY", baseURL: "https://llm-api.carizon.work/v1", }); const r = await client.chat.completions.create({ model: "gpt-5.5", messages: [{role:"user",content:"hi"}], }); console.log(r.choices[0].message.content);
💡 想可视化?去 模型广场 看完整定价、按厂商筛选。

错误码与限额

HTTP 状态码

HTTPcode说明
401missing_api_key / invalid_api_keyBearer header 缺失或 key 不存在 / 被吊销
403account_suspended账户被管理员暂停。联系支持。
404not_found路径不对,确认是 /v1/chat/completions
429user_monthly_cap本月 $20 配额用完,月初 UTC 重置
429rate_limit请求 QPS 超限,几秒后重试
500upstream_error上游模型服务异常
503service_unavailable所有上游账号配额耗尽,几分钟后再试

月度限额

每个用户独立 $20 USD / 月 总配额(跨所有 key 累计):

  • 实际成本按 token:输入 × 输入价 + 输出 × 输出价(见 模型广场
  • 每月 1 号 00:00 UTC 自动重置
  • 用满后返回 429 user_monthly_cap — 不会无声超支
  • /portal 查实时用量 + 按模型 / 按 key 分布
需要更高额度?联系管理员调整 cap(admin 可在用户管理里改)。
POST

Chat Completions

/v1/chat/completions

OpenAI 标准聊天接口。支持 gpt-* / claude-* / gemini-* / grok-* / DeepSeek 等所有模型。

请求参数

param类型说明
model必填string模型名(参考 /models 页面)
messages必填array[{role, content}] 数组
streambool是否 SSE 流式返回,默认 false
temperaturenumber0–2,默认 1
max_tokensnumber响应最大 token 数
toolsarrayfunction calling 定义
tool_choicestring|object"auto" / "none" / 指定 tool
response_formatobjectJSON mode: {"type":"json_object"}

示例

curl https://llm-api.carizon.work/v1/chat/completions \ -X POST \ -H "Authorization: Bearer $YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-sonnet-4-6", "messages": [ {"role":"system","content":"You are helpful."}, {"role":"user","content":"Why is the sky blue?"} ], "temperature": 0.7, "max_tokens": 1024, "stream": false }'
from openai import OpenAI client = OpenAI(api_key="$YOUR_KEY", base_url="https://llm-api.carizon.work/v1") r = client.chat.completions.create( model="claude-sonnet-4-6", messages=[ {"role":"system","content":"You are helpful."}, {"role":"user","content":"Why is the sky blue?"}, ], temperature=0.7, max_tokens=1024, ) print(r.choices[0].message.content) print(r.usage)
import OpenAI from "openai"; const client = new OpenAI({ apiKey: "$YOUR_KEY", baseURL: "https://llm-api.carizon.work/v1", }); const r = await client.chat.completions.create({ model: "claude-sonnet-4-6", messages: [ {role:"system", content:"You are helpful."}, {role:"user", content:"Why is the sky blue?"}, ], temperature: 0.7, max_tokens: 1024, }); console.log(r.choices[0].message.content);
POST

Responses

/v1/responses

OpenAI 新一代 API,专为 gpt-5-proo1codex 等推理模型设计,原生支持 thinking budget 与 tool calling。

请求参数

param类型说明
model必填stringgpt-5.4-pro / gpt-5.3-codex 等
input必填string|array输入文本或多模态数组
reasoningobject{"effort":"low|medium|high"}
streamboolSSE 流式

示例

curl https://llm-api.carizon.work/v1/responses \ -X POST \ -H "Authorization: Bearer $YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-5.4-pro", "input": "Solve: 2x + 3 = 11", "reasoning": {"effort":"medium"} }'
from openai import OpenAI client = OpenAI(api_key="$YOUR_KEY", base_url="https://llm-api.carizon.work/v1") r = client.responses.create( model="gpt-5.4-pro", input="Solve: 2x + 3 = 11", reasoning={"effort":"medium"}, ) print(r.output_text)
POST

Embeddings

/v1/embeddings

文本向量化。后端按 model 自动路由到 Azure OpenAI 或 Azure AI Foundry,对调用方完全 OpenAI-API 兼容。

支持的模型

model上游维度价格 / 1M tokens
text-embedding-3-largeAzure OpenAI3072 (可降到 256/512/1024)$0.13
text-embedding-3-smallAzure OpenAI1536 (可降)$0.02
embed-v-4-0Azure AI Foundry (Cohere)1536 (默认)$0.12

请求参数

param类型说明
model必填string见上表
input必填string|array要向量化的文本或文本数组
dimensionsnumber向量维度(仅 text-embedding-3 支持)

示例

curl https://llm-api.carizon.work/v1/embeddings \ -X POST \ -H "Authorization: Bearer $YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "text-embedding-3-large", "input": "The quick brown fox" }'
from openai import OpenAI client = OpenAI(api_key="$YOUR_KEY", base_url="https://llm-api.carizon.work/v1") r = client.embeddings.create( model="text-embedding-3-large", input=["The quick brown fox", "Hello world"], ) for e in r.data: print(len(e.embedding), e.embedding[:5])
POST

Audio Chat

/v1/chat/completions (model = gpt-audio-1.5)

音频输入/输出 的对话接口。基于 OpenAI 的 audio-capable chat-completions 协议(modalities 字段决定输入/输出走音频还是文本)。模型纯音频,纯文本请求会被 400 拒收

请求参数(与 chat/completions 一致,新增)

param类型说明
model必填stringgpt-audio-1.5
modalities必填array["text", "audio"]["audio"] 任一
audioobject{"voice":"alloy","format":"wav"}
messages必填arraycontent 里包 {"type":"input_audio","input_audio":{"data":"<base64>","format":"wav"}}

示例

curl https://llm-api.carizon.work/v1/chat/completions \ -X POST -H "Authorization: Bearer $YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-audio-1.5", "modalities": ["text", "audio"], "audio": {"voice": "alloy", "format": "wav"}, "messages": [ {"role":"user","content":[{"type":"input_audio","input_audio":{"data":"","format":"wav"}}]} ] }'
import base64, openai client = openai.OpenAI(api_key="$YOUR_KEY", base_url="https://llm-api.carizon.work/v1") audio_b64 = base64.b64encode(open("hello.wav","rb").read()).decode() r = client.chat.completions.create( model="gpt-audio-1.5", modalities=["text", "audio"], audio={"voice":"alloy","format":"wav"}, messages=[{"role":"user","content":[ {"type":"input_audio","input_audio":{"data":audio_b64,"format":"wav"}} ]}], ) print(r.choices[0].message.audio.transcript) open("out.wav","wb").write(base64.b64decode(r.choices[0].message.audio.data))
POST

Speech 转录 / 翻译

/v1/azure-speech/speechtotext/transcriptions:transcribe?api-version=2024-11-15

语音转文字 + 语音翻译,走 Azure Speech Fast Transcription(不是 OpenAI Whisper)。我们透传 Azure 的原生 multipart 接口(不做 OpenAI 协议翻译),因此请按 Azure SDK 示例发请求。

可用模型

  • gpt-realtime-whisper — 语音转文字(保留原语言)
  • gpt-realtime-translate — 语音翻译成英文

multipart body

part说明
audio音频文件(wav/mp3/m4a 等,最大 100MB)
definitionJSON:{"locales":["en-US"],"model":"gpt-realtime-whisper"}

示例(curl)

curl https://llm-api.carizon.work/v1/azure-speech/speechtotext/transcriptions:transcribe?api-version=2024-11-15 \ -X POST -H "Authorization: Bearer $YOUR_KEY" \ -F "audio=@hello.wav" \ -F 'definition={"locales":["en-US"],"model":"gpt-realtime-whisper"}' # response: {"durationMilliseconds":..., "combinedPhrases":[{"text":"..."}], "phrases":[...]}
WS

Realtime(实时语音对话)

wss://llm-api.carizon.work/v1/realtime?model=gpt-realtime-2

WebSocket 接入 Azure OpenAI Realtime API,用于实时语音对话(VAD、流式音频 in/out、function calling)。鉴权用 Authorization: Bearer 或查询参数 ?api-key= / ?key=(浏览器场景)。

连接参数

param说明
model当前唯一支持 gpt-realtime-2
voice可选;alloy / echo / shimmer

事件协议

完全遵循 OpenAI Realtime API 的 JSON 事件结构:客户端可发 session.update / input_audio_buffer.append / response.create 等;服务端推送 session.created / response.audio.delta / response.done 等。

示例(Node + ws)

// npm i ws import { WebSocket } from "ws"; const ws = new WebSocket( "wss://llm-api.carizon.work/v1/realtime?model=gpt-realtime-2", { headers: { Authorization: `Bearer ${process.env.YOUR_KEY}` } } ); ws.on("open", () => { // 配置 session ws.send(JSON.stringify({ type: "session.update", session: { instructions: "You are a helpful Mandarin tutor." } })); }); ws.on("message", (d) => console.log(JSON.parse(d.toString())));
# pip install websockets import asyncio, json, os, websockets async def main(): uri = "wss://llm-api.carizon.work/v1/realtime?model=gpt-realtime-2" headers = {"Authorization": f"Bearer {os.environ['YOUR_KEY']}"} async with websockets.connect(uri, additional_headers=headers) as ws: await ws.send(json.dumps({"type":"session.update", "session":{"instructions":"Be concise."}})) async for msg in ws: print(json.loads(msg)) asyncio.run(main())

计费

按上游音频 token 计费(输入 $40 / 输出 $80 每 1M token,约 1 美元/分钟密集对话)。会话起止与上下行字节数记入 traffic_realtime_sessions 表,/admin#traffic 可查。

POST

Images

/v1/images/generations

图像生成。支持 gpt-image-2grok-imagine-image 等。

请求参数

param类型说明
model必填stringgpt-image-2 / grok-imagine-image
prompt必填string图像描述
sizestring1024x1024 / 1792x1024
qualitystringstandard / hd
nnumber生成图片张数 (默认 1)

示例

curl https://llm-api.carizon.work/v1/images/generations \ -X POST \ -H "Authorization: Bearer $YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-image-2", "prompt": "A serene Japanese garden in spring", "size": "1024x1024", "n": 1 }'
from openai import OpenAI client = OpenAI(api_key="$YOUR_KEY", base_url="https://llm-api.carizon.work/v1") r = client.images.generate( model="gpt-image-2", prompt="A serene Japanese garden in spring", size="1024x1024", ) print(r.data[0].url)
GET

列出模型

/v1/models

返回当前可用的所有模型列表(OpenAI 标准格式)。

示例

curl https://llm-api.carizon.work/v1/models \ -H "Authorization: Bearer $YOUR_KEY"
from openai import OpenAI client = OpenAI(api_key="$YOUR_KEY", base_url="https://llm-api.carizon.work/v1") for m in client.models.list().data: print(m.id)
POST

Messages

/v1/messages

Anthropic 原生协议。Claude SDK / Claude Code 用这个。支持 thinking、tools、web_search、code_execution、files 等高级特性。

请求参数

param类型说明
model必填stringclaude-sonnet-4-6 / claude-opus-4-7
messages必填array对话消息数组
max_tokens必填number必填,最大输出 token 数
systemstring系统提示词
thinkingobject{"type":"adaptive","budget_tokens":N}
toolsarray含原生 web_search_20250305 / code_execution_20250825
streamboolSSE 流式

认证 header

x-api-key: usr-...Authorization: Bearer usr-... 都可。原生 Anthropic SDK 自动用 x-api-key。

示例

curl https://llm-api.carizon.work/v1/messages \ -X POST \ -H "x-api-key: $YOUR_KEY" \ -H "anthropic-version: 2023-06-01" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-sonnet-4-6", "max_tokens": 1024, "messages": [{"role":"user","content":"Hello!"}] }'
import anthropic client = anthropic.Anthropic( api_key="$YOUR_KEY", base_url="https://llm-api.carizon.work", ) msg = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, messages=[{"role":"user","content":"Hello!"}], ) print(msg.content[0].text)
POST

generateContent

/v1beta/models/{model}:generateContent

Gemini 原生协议。Google AI SDK 用这个。{model} 替换为模型名(如 gemini-2.5-pro)。

请求参数

param类型说明
contents必填array[{role,parts:[{text}]}]
generationConfigobjecttemperature / topP / maxOutputTokens 等
systemInstructionobject系统提示
toolsarrayfunction declarations

认证

Gemini 用 query string:?key=usr-xxxxxxxx

示例

curl "https://llm-api.carizon.work/v1beta/models/gemini-2.5-pro:generateContent?key=$YOUR_KEY" \ -X POST \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "role": "user", "parts": [{"text": "Hello!"}] }] }'
import google.generativeai as genai genai.configure( api_key="$YOUR_KEY", transport="rest", client_options={"api_endpoint": "https://llm-api.carizon.work"}, ) model = genai.GenerativeModel("gemini-2.5-pro") r = model.generate_content("Hello!") print(r.text)

SSE 流式响应

所有协议都支持 streaming:

  • OpenAI:请求体加 "stream": true
  • Anthropic:同 "stream": true
  • Gemini:endpoint 改成 :streamGenerateContent?alt=sse

响应是 text/event-stream,每条事件 data: {...},结束 data: [DONE]

OpenAI 流式 chunk 结构

data: {"choices":[{"delta":{"content":"Hello"},"index":0}]} data: {"choices":[{"delta":{"content":", "},"index":0}]} data: {"choices":[{"delta":{"content":"world"},"index":0}]} data: {"choices":[{"delta":{},"finish_reason":"stop","index":0}]} data: [DONE]
r = client.chat.completions.create( model="claude-sonnet-4-6", messages=[{"role":"user","content":"hi"}], stream=True, ) for chunk in r: delta = chunk.choices[0].delta.content if delta: print(delta, end="", flush=True)

Tool / Function Calling

支持 OpenAI 标准 tools 参数。Claude、GPT、Gemini 都用同一格式:

curl https://llm-api.carizon.work/v1/chat/completions \ -X POST \ -H "Authorization: Bearer $YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-sonnet-4-6", "messages": [{"role":"user","content":"北京天气怎么样"}], "tools": [{ "type": "function", "function": { "name": "get_weather", "description": "Get current weather for a city", "parameters": { "type": "object", "properties": { "city": {"type":"string","description":"City name"} }, "required": ["city"] } } }] }'
r = client.chat.completions.create( model="claude-sonnet-4-6", messages=[{"role":"user","content":"北京天气怎么样"}], tools=[{ "type":"function", "function":{ "name":"get_weather", "parameters":{ "type":"object", "properties":{"city":{"type":"string"}}, "required":["city"], }, }, }], ) print(r.choices[0].message.tool_calls)

Anthropic 内置工具(web_search / code_execution / computer_use)通过原生 /v1/messages 协议调用,type 使用官方版本号(如 web_search_20250305)。详见 Anthropic 官方文档

图片输入 (Vision)

所有支持 vision 的模型都接受 OpenAI 标准的 image_url content:

curl https://llm-api.carizon.work/v1/chat/completions \ -X POST \ -H "Authorization: Bearer $YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [{ "role": "user", "content": [ {"type":"text","text":"这张图是什么?"}, {"type":"image_url","image_url":{"url":"https://example.com/cat.jpg"}} ] }] }'
r = client.chat.completions.create( model="gpt-4o", messages=[{ "role":"user", "content":[ {"type":"text","text":"这张图是什么?"}, {"type":"image_url","image_url":{ "url":"https://example.com/cat.jpg"}}, ], }], ) print(r.choices[0].message.content)

支持 ClaudeGPT-4o/5.xGeminiGrok-4.20 等。data: URL 和 https://... URL 都支持。

推理模式 (thinking)

Claude / GPT-5-pro / Grok-reasoning 等支持显式推理:

Anthropic — adaptive thinking

curl https://llm-api.carizon.work/v1/messages \ -X POST \ -H "x-api-key: $YOUR_KEY" \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "claude-opus-4-7", "max_tokens": 4096, "thinking": {"type":"adaptive","budget_tokens":8000}, "messages": [{"role":"user","content":"Prove sqrt(2) is irrational"}] }'

OpenAI Responses — effort

curl https://llm-api.carizon.work/v1/responses \ -X POST \ -H "Authorization: Bearer $YOUR_KEY" \ -d '{ "model": "gpt-5.4-pro", "input": "Prove sqrt(2) is irrational", "reasoning": {"effort":"high"} }'

常见问题

Q: key 泄露了怎么办?

portal → API Keys → 找到那个 key → 吊销。立刻失效,可再创建新的。

Q: 怎么查实时用量?

登录 portal,首屏显示本月用量 / cap。明细按模型 / 时间分布在 用量 标签。

Q: 数据安全 — prompt 会被记录吗?

请求体不持久化磁盘。仅 token 数 + 模型 + 时间用于计费。响应不存。

Q: 不在列表里的模型可以加吗?

联系管理员加 upstream provider(cliproxyapi 支持几乎所有主流 OAuth/API 模型)。

Q: 支持 batch API 吗?

暂不支持 OpenAI batch endpoint。如有需要联系管理员。

Q: 支持 function calling 吗?

支持。OpenAI 标准 tools 字段在所有模型都生效,详见 Tool / Function Calling

Q: 国内能直接调吗?

本网关部署在 Azure Japan East,国内直连或加速都可。如果有网络问题用国内 VPS 中转一次。

Q: 联系方式

飞书 → 找运维 / 管理员。问题请提供 (1) 你的 email (2) 调用时间 (3) HTTP code + 响应体。