# Speech Hub API
# https://hub.brewcode.app
#
# Self-hosted speech-to-text service with LLM text cleanup.
# Pipeline: Audio -> FFmpeg -> Whisper large-v3-turbo -> Qwen LLM -> Clean text

## Authentication
API tokens (sh- prefix) go in the X-API-Key header.
JWT tokens go in Authorization: Bearer header.
Obtain tokens from the web UI at /tokens after login.

## How to use your token
API tokens start with sh- prefix. Pass in X-API-Key header:

curl -H "X-API-Key: sh-your-token-here" \
  https://hub.brewcode.app/v1/audio/transcriptions \
  -F file=@audio.mp3 -F pipeline=full

## Available models

ASR (speech-to-text):
  whisper-large-v3-turbo  -- default Whisper model for transcription

LLM (chat, text cleanup):
  qwen3.5:9b   -- vLLM AWQ, pipeline + chat default (INT4, fastest inference)
  qwen2.5:7b   -- Ollama, legacy fallback (~160 tok/s)

Image generation (Flux 2 via ComfyUI on A100 80 GB):
  flux2-klein-9b  -- default, 9B params, ~12-25s warm
  flux2-dev       -- flagship 32B params, ~50-90s warm, max quality

## Transcribe audio

POST /v1/audio/transcriptions/sync  (scope: transcribe)  <- PREFERRED for short recordings
Content-Type: multipart/form-data
  file: audio file (mp3, wav, m4a, ogg, flac, webm; max 500 MB)
  pipeline: fast | full (default: full)
  language: auto | ru | en | de | fr | es | zh | ja | ko
  response_format: json | text | verbose_json | srt | vtt
  prompt: hint text for Whisper (domain terms, abbreviations)
  system_prompt_id: UUID of custom cleanup prompt (pipeline=full only)
Returns HTTP 200 with transcription in the requested format.
Header X-Pipeline-Run-Id contains the pipeline run UUID (for history lookup).
Use for recordings under ~5 minutes. Connection stays open until complete.

POST /v1/audio/transcriptions  (scope: transcribe)  <- for long files / fire-and-forget
Same parameters as above.
Returns HTTP 202 with {run_id, status: "processing"}.
Poll GET /v1/pipelines/{run_id} until status is "completed", then read artifacts.

## Batch transcription (pseudo-streaming)
POST /v1/audio/batches  (scope: transcribe)
POST /v1/audio/batches/{id}/chunks
POST /v1/audio/batches/{id}/finalize
GET  /v1/audio/batches/{id}
Upload audio in chunks while recording. Finalize to merge + LLM cleanup.

## Image generation (Flux 2 via ComfyUI, OpenAI-compatible)

POST /v1/images/generations   (scope: images)   -- txt2img, JSON body
POST /v1/images/edits         (scope: images)   -- reference-image edit, multipart
GET  /v1/images/{id}/file     (scope: images)   -- PNG bytes, ownership-scoped, 7d TTL
Models: flux2-klein-9b (default, ~15s) | flux2-dev (32B flagship, ~60s)
Sizes: 1024x1024 | 1024x1536 | 1536x1024. n: 1..4. response_format: url | b64_json.
Sync only, timeout 300s. Edit uses ReferenceLatent (full identity preservation, NOT inpainting).
Full schemas in /openapi.yaml; works with openai-python SDK.

## Chat completions
POST /v1/chat/completions  (scope: chat)
Content-Type: application/json
  {"messages": [{"role": "user", "content": "text"}], "stream": true}

  Content format:
    messages[].content accepts EITHER a plain string OR an OpenAI-style array of text parts.
    Both forms are OpenAI-compatible. Arrays are concatenated server-side (parts joined
    with "\n") before forwarding to the upstream model, so the wire protocol to the model
    is always a single string. Supported part type: {"type":"text","text":"..."}.
    Other part types (image_url, audio, etc.) return HTTP 422.
    Examples:
      {"role": "user", "content": "Hello, world"}                                             <- accepted
      {"role": "user", "content": [{"type":"text","text":"Hello, world"}]}                    <- accepted, equivalent
      {"role": "user", "content": [{"type":"text","text":"A"},{"type":"text","text":"B"}]}    <- joined as "A\nB"
      {"role": "user", "content": [{"type":"image_url","image_url":"..."}]}                   <- HTTP 422

  Message roles & tool round-trip:
    role: "system" | "user" | "assistant" | "tool"
    "tool" messages are replies to assistant tool_calls in a function-calling
    roundtrip; they MUST include tool_call_id matching the assistant's
    tool_calls[].id (HTTP 422 otherwise). Example multi-turn sequence:
      {"role":"system","content":"..."}
      {"role":"user","content":"..."}
      {"role":"assistant","content":null,"tool_calls":[{"id":"call_1","type":"function","function":{"name":"f","arguments":"{}"}}]}
      {"role":"tool","tool_call_id":"call_1","content":"<tool output>"}
    "function" role (deprecated by OpenAI) is not supported and returns 422.
    Optional "name" field is accepted on any message (participant/tool identifier).

  model: optional -- server picks default if omitted (qwen3.5:9b)
  use_memory: bool -- include user memory in context
  think: bool -- enable model reasoning (26.6x slower on qwen3.5; rejected with 400 on qwen2.5:7b)
  tools: OpenAI-style function tool definitions (forwarded to both backends)
  tool_choice: 'auto' | 'required' | 'none' | {named} (Ollama 0.20.3: silently omitted)
  response_format: {"type":"text"} | {"type":"json_object"} | {"type":"json_schema","json_schema":{...}}
      json_schema is vLLM-only; routing it to Ollama returns 400
  parallel_tool_calls: bool (vLLM only; Ollama silently omits)
Response surface: choices[0].message.{reasoning, tool_calls}; finish_reason may be 'tool_calls'.
Latency: bare ~500-1000ms; think=true 10-30s; tools <10% overhead; json_schema vLLM-only.

## Pipelines (transcription history)
GET  /v1/pipelines  (scope: pipelines:read)
GET  /v1/pipelines/{id}  (scope: pipelines:read)
GET  /v1/pipelines/{id}/artifacts/{artifact_id}  (scope: pipelines:read)
DELETE /v1/pipelines/{id}  (scope: pipelines:write)

## System prompts
GET    /v1/prompts  (scope: prompts:read)
POST   /v1/prompts  (scope: prompts:write)
GET    /v1/prompts/{id}  (scope: prompts:read)
PUT    /v1/prompts/{id}  (scope: prompts:write)
DELETE /v1/prompts/{id}  (scope: prompts:write)
POST   /v1/prompts/{id}/reset  (scope: prompts:write)

## User dictionary
GET    /v1/dictionary  (scope: dictionary:read)
PUT    /v1/dictionary  (scope: dictionary:write)
PATCH  /v1/dictionary  (scope: dictionary:write)
DELETE /v1/dictionary  (scope: dictionary:write)

## User memory
GET    /v1/memory  (scope: memory:read)
POST   /v1/memory  (scope: memory:write)
DELETE /v1/memory/{id}  (scope: memory:write)

## Tools
POST /v1/tools/optimize  (scope: tools:optimize)
POST /v1/tokenize  (scope: tools:tokenize)

## Models
GET /v1/models  (scope: models:read)

## Auth tokens
POST /auth/tokens  (JWT only)
GET  /auth/tokens  (JWT only)
DELETE /auth/tokens/{id}  (JWT only)

## Documentation
GET /docs - interactive API documentation (Scalar)
GET /openapi.json - OpenAPI 3.1 spec (JSON, public)
GET /openapi.yaml - OpenAPI 3.1 spec (YAML, public)
GET /llm.txt - this file