# Speech Hub API # https://hub.brewcode.app # # Self-hosted speech-to-text service with LLM text cleanup. # Pipeline: Audio -> FFmpeg -> Whisper large-v3-turbo -> Qwen LLM -> Clean text ## Authentication API tokens (sh- prefix) go in the X-API-Key header. JWT tokens go in Authorization: Bearer header. Obtain tokens from the web UI at /tokens after login. ## How to use your token API tokens start with sh- prefix. Pass in X-API-Key header: curl -H "X-API-Key: sh-your-token-here" \ https://hub.brewcode.app/v1/audio/transcriptions \ -F file=@audio.mp3 -F pipeline=full ## Available models ASR (speech-to-text): whisper-large-v3-turbo -- default Whisper model for transcription LLM (chat, text cleanup): qwen3.5:9b -- vLLM AWQ, pipeline + chat default (INT4, fastest inference) qwen2.5:7b -- Ollama, legacy fallback (~160 tok/s) Image generation (Flux 2 via ComfyUI on A100 80 GB): flux2-klein-9b -- default, 9B params, ~12-25s warm flux2-dev -- flagship 32B params, ~50-90s warm, max quality ## Transcribe audio POST /v1/audio/transcriptions/sync (scope: transcribe) <- PREFERRED for short recordings Content-Type: multipart/form-data file: audio file (mp3, wav, m4a, ogg, flac, webm; max 500 MB) pipeline: fast | full (default: full) language: auto | ru | en | de | fr | es | zh | ja | ko response_format: json | text | verbose_json | srt | vtt prompt: hint text for Whisper (domain terms, abbreviations) system_prompt_id: UUID of custom cleanup prompt (pipeline=full only) Returns HTTP 200 with transcription in the requested format. Header X-Pipeline-Run-Id contains the pipeline run UUID (for history lookup). Use for recordings under ~5 minutes. Connection stays open until complete. POST /v1/audio/transcriptions (scope: transcribe) <- for long files / fire-and-forget Same parameters as above. Returns HTTP 202 with {run_id, status: "processing"}. Poll GET /v1/pipelines/{run_id} until status is "completed", then read artifacts. ## Batch transcription (pseudo-streaming) POST /v1/audio/batches (scope: transcribe) POST /v1/audio/batches/{id}/chunks POST /v1/audio/batches/{id}/finalize GET /v1/audio/batches/{id} Upload audio in chunks while recording. Finalize to merge + LLM cleanup. ## Image generation (Flux 2 via ComfyUI, OpenAI-compatible) POST /v1/images/generations (scope: images) -- txt2img, JSON body POST /v1/images/edits (scope: images) -- reference-image edit, multipart GET /v1/images/{id}/file (scope: images) -- PNG bytes, ownership-scoped, 7d TTL Models: flux2-klein-9b (default, ~15s) | flux2-dev (32B flagship, ~60s) Sizes: 1024x1024 | 1024x1536 | 1536x1024. n: 1..4. response_format: url | b64_json. Sync only, timeout 300s. Edit uses ReferenceLatent (full identity preservation, NOT inpainting). Full schemas in /openapi.yaml; works with openai-python SDK. ## Chat completions POST /v1/chat/completions (scope: chat) Content-Type: application/json {"messages": [{"role": "user", "content": "text"}], "stream": true} Content format: messages[].content accepts EITHER a plain string OR an OpenAI-style array of text parts. Both forms are OpenAI-compatible. Arrays are concatenated server-side (parts joined with "\n") before forwarding to the upstream model, so the wire protocol to the model is always a single string. Supported part type: {"type":"text","text":"..."}. Other part types (image_url, audio, etc.) return HTTP 422. Examples: {"role": "user", "content": "Hello, world"} <- accepted {"role": "user", "content": [{"type":"text","text":"Hello, world"}]} <- accepted, equivalent {"role": "user", "content": [{"type":"text","text":"A"},{"type":"text","text":"B"}]} <- joined as "A\nB" {"role": "user", "content": [{"type":"image_url","image_url":"..."}]} <- HTTP 422 Message roles & tool round-trip: role: "system" | "user" | "assistant" | "tool" "tool" messages are replies to assistant tool_calls in a function-calling roundtrip; they MUST include tool_call_id matching the assistant's tool_calls[].id (HTTP 422 otherwise). Example multi-turn sequence: {"role":"system","content":"..."} {"role":"user","content":"..."} {"role":"assistant","content":null,"tool_calls":[{"id":"call_1","type":"function","function":{"name":"f","arguments":"{}"}}]} {"role":"tool","tool_call_id":"call_1","content":""} "function" role (deprecated by OpenAI) is not supported and returns 422. Optional "name" field is accepted on any message (participant/tool identifier). model: optional -- server picks default if omitted (qwen3.5:9b) use_memory: bool -- include user memory in context think: bool -- enable model reasoning (26.6x slower on qwen3.5; rejected with 400 on qwen2.5:7b) tools: OpenAI-style function tool definitions (forwarded to both backends) tool_choice: 'auto' | 'required' | 'none' | {named} (Ollama 0.20.3: silently omitted) response_format: {"type":"text"} | {"type":"json_object"} | {"type":"json_schema","json_schema":{...}} json_schema is vLLM-only; routing it to Ollama returns 400 parallel_tool_calls: bool (vLLM only; Ollama silently omits) Response surface: choices[0].message.{reasoning, tool_calls}; finish_reason may be 'tool_calls'. Latency: bare ~500-1000ms; think=true 10-30s; tools <10% overhead; json_schema vLLM-only. ## Pipelines (transcription history) GET /v1/pipelines (scope: pipelines:read) GET /v1/pipelines/{id} (scope: pipelines:read) GET /v1/pipelines/{id}/artifacts/{artifact_id} (scope: pipelines:read) DELETE /v1/pipelines/{id} (scope: pipelines:write) ## System prompts GET /v1/prompts (scope: prompts:read) POST /v1/prompts (scope: prompts:write) GET /v1/prompts/{id} (scope: prompts:read) PUT /v1/prompts/{id} (scope: prompts:write) DELETE /v1/prompts/{id} (scope: prompts:write) POST /v1/prompts/{id}/reset (scope: prompts:write) ## User dictionary GET /v1/dictionary (scope: dictionary:read) PUT /v1/dictionary (scope: dictionary:write) PATCH /v1/dictionary (scope: dictionary:write) DELETE /v1/dictionary (scope: dictionary:write) ## User memory GET /v1/memory (scope: memory:read) POST /v1/memory (scope: memory:write) DELETE /v1/memory/{id} (scope: memory:write) ## Tools POST /v1/tools/optimize (scope: tools:optimize) POST /v1/tokenize (scope: tools:tokenize) ## Models GET /v1/models (scope: models:read) ## Auth tokens POST /auth/tokens (JWT only) GET /auth/tokens (JWT only) DELETE /auth/tokens/{id} (JWT only) ## Documentation GET /docs - interactive API documentation (Scalar) GET /openapi.json - OpenAPI 3.1 spec (JSON, public) GET /openapi.yaml - OpenAPI 3.1 spec (YAML, public) GET /llm.txt - this file