SpeakTrue

Speech Workflow Contract

Reader and action

This contract is for native and web engineers implementing the generated speech to Soundboard save workflow. After reading it, a caller should be able to generate speech, retain the returned artifact reference, save that artifact to Soundboard, and branch correctly on structured failures without inspecting legacy planning artifacts.

Scope

The workflow has two HTTP operations:

POST /api/text-to-speech generates a reusable speech artifact.
POST /api/soundboard/save-clip saves that generated artifact into a Soundboard category.

Privileged storage writes remain server-side. Browser and native clients only forward artifact metadata returned by the generation operation; they do not write directly to object storage.

Generate speech request

POST /api/text-to-speech

{
  "text": "Text to speak",
  "voice": "voice-id",
  "model": "eleven_multilingual_v2",
  "stability": 0.8,
  "similarity": 0.7,
  "speed": 0.9,
  "style": 0.2,
  "speaker_boost": true
}

Required:

text: non-empty string.

Optional fields fall back to server defaults when omitted. Numeric tuning fields must be parseable as numbers.

Generate speech success

HTTP status: 200

{
  "success": true,
  "status": 200,
  "artifact_id": "generated-id",
  "artifact_path": "speech/anonymous/generated-id.mp3",
  "audio_path": "/static/speech/anonymous/generated-id.mp3",
  "download_path": "/download-audio?path=speech/anonymous/generated-id.mp3"
}

Client rules:

Treat success: true, numeric status: 200, artifact_id, artifact_path, audio_path, and download_path as required.
Prefer artifact_path when saving; keep audio_path for playback and compatibility.
Clear any stale reusable artifact if the response is missing a required canonical field.
Do not enable save-to-Soundboard until a valid latest artifact exists.

Save to Soundboard request

POST /api/soundboard/save-clip

{
  "category": "Lecture1",
  "text": "Text used to generate the speech",
  "artifact_path": "speech/anonymous/generated-id.mp3",
  "artifact_id": "generated-id",
  "audio_path": "/static/speech/anonymous/generated-id.mp3",
  "download_path": "/download-audio?path=speech/anonymous/generated-id.mp3",
  "format": "mp3",
  "bitrate_kbps": 192,
  "normalize": true
}

Required:

category: target Soundboard category path.
text: original generated text.
artifact_path: canonical generated artifact path from the generation response.

Compatibility:

audio_path is accepted by the server, but new callers should send artifact_path.
artifact_id and download_path are forwarded for diagnostics and continuity; they are not privileged storage credentials.

Save to Soundboard success

HTTP status: 200

{
  "success": true,
  "message": "Clip and text saved to Lecture1 category in storage",
  "clip": {
    "name": "Text_to_speak",
    "filename": "Text_to_speak_1712345678.mp3",
    "url": "https://storage.example/soundboard/Lecture1/Text_to_speak_1712345678.mp3",
    "text_url": "https://storage.example/soundboard/Lecture1/Text_to_speak_1712345678.txt",
    "text_filename": "Text_to_speak_1712345678.txt",
    "category": "Lecture1",
    "timestamp": 1712345678
  }
}

Client rules:

Treat success: true, clip.filename, and clip.category as required.
Do not notify or refresh Soundboard for malformed success without clip metadata.
A missing category refresh hook is diagnostic-only: the save remains successful, and the UI should still surface the success toast.

Failure envelope

Generation and save failures use the same structured envelope:

{
  "error": "Human-readable failure message",
  "error_code": "SPEECH_ARTIFACT_NOT_FOUND",
  "operation": "save-clip-to-soundboard",
  "status": 404,
  "retryable": false,
  "details": {
    "phase": "audio_upload",
    "backend": "supabase"
  }
}

Required fields:

error: human-readable message.
error_code: stable machine-readable code.
operation: tts-generate or save-clip-to-soundboard.
status: numeric status for Flask speech workflow errors; Supabase Edge provider errors may include provider_status separately.
retryable: boolean retry guidance.

Details rules:

details is optional and bounded.
Details must not expose provider secrets, bearer tokens, filesystem absolute paths, or full upstream provider bodies.

Status and retryability expectations

Validation failures: HTTP/status 400, retryable: false.
Unauthorized or forbidden calls: 401/403, retryable: false.
Missing generated artifact: 404, retryable: false.
Provider/runtime failures: 500 or 502, usually retryable: true.
Quota or entitlement failures are not retryable without account state changes.

Artifact path rules

artifact_path is a server-relative generated speech path such as speech/anonymous/<id>.mp3.
audio_path is the browser-playable /static/... form of that artifact.
Save requests must not send filesystem paths or traversal paths.
If generation fails or returns malformed success, clients must clear latestTTSSpeechArtifact and block save.
If save fails with a structured API error, clients may keep the current artifact for retry when the artifact itself remains valid.

Browser diagnostics

The legacy web runtime exposes:

window.latestTTSSpeechArtifact: the current reusable generated artifact or null.
window.__ttsSoundboardWorkflowTestHooks.getState(): latest artifact, generation diagnostic, save diagnostic, and saving flag.

Diagnostics include generation/save phase, operation, error code, status, retryability, message, and bounded details where available.

Native consumer checklist

Validate generation success before storing an artifact reference.
Store artifact_path, artifact_id, audio_path, and download_path together.
Send artifact_path and original text when saving to Soundboard.
Branch on error_code, operation, status, and retryable; do not parse prose messages.
Clear stale latest artifacts on generation failures and malformed generation success.
Keep a valid artifact available for safe retry after save API failures.
Treat Soundboard refresh/open hook failures as diagnostics, not save failures.

Executable proof

Run these checks before changing the contract:

node web/python-web-app/tests/js/tts_soundboard_workflow_runtime_check.mjs
web/python-web-app/venv/bin/pytest web/python-web-app/tests/api/test_speech_contract_hardening.py web/python-web-app/tests/api/test_tts_soundboard_workflow_target.py web/python-web-app/tests/api/test_tts.py web/python-web-app/tests/api/test_soundboard_supabase_mode.py
deno test backend/supabase/functions/tts-generate/handler_test.ts
deno test backend/supabase/functions/soundboard-save-generated/handler_test.ts

The graph rebuild command keeps code navigation artifacts current after implementation changes:

python3 -c "from graphify.watch import _rebuild_code; from pathlib import Path; _rebuild_code(Path('.'))"