SpeakTrue

Generate Speech to Soundboard Workflow Target

Reader and post-read action

This brief is for the S02 implementation agent landing cold on the product workflow. After reading it, the agent should implement and verify the first end-to-end path where generated speech is saved into Soundboard and can be reloaded for reuse without reopening discovery.

Initial surface decision

Use the legacy web main TTS screen plus the legacy web Soundboard as the initial implementation surface for R045 and R046. This surface already has the text-to-speech API, the save-to-Soundboard API, the browser save modal, Soundboard category loading, pytest coverage, and the storage-provider seam needed to prove the user-visible workflow fastest.

Native parity, storage-provider migration execution, and additional speech surfaces are downstream work. They should consume the acceptance contract proven here rather than expand this slice into another platform inventory.

Current evidence files

Only these tracked files are needed for S02 implementation targeting:

web/python-web-app/src/routes/tts.py exposes POST /api/text-to-speech and delegates generation to the service layer.
web/python-web-app/src/services/tts_service.py allocates a generated speech artifact and returns artifact_id, audio_path, and download_path on successful generation.
web/python-web-app/src/legacy_runtime.py handles POST /api/soundboard/save-clip, requires both text and an artifact reference, uploads the audio/text assets to the configured storage provider path, updates clip order metadata, and invalidates category cache.
web/python-web-app/static/js/index_speech.js owns the browser generate-and-save modal flow. Its current save request sends category, text, format, bitrate, and normalize values, but does not yet submit the generated artifact reference returned by /api/text-to-speech.
web/python-web-app/static/js/index_soundboard.js owns Soundboard category load, clip render, play, download, and reload/reuse behavior.
web/python-web-app/tests/api/test_tts.py already asserts generated TTS responses expose artifact fields and that save-to-Soundboard rejects requests missing an explicit artifact reference or text.

Exact user workflow

The user opens the legacy web main TTS surface.
The user enters text and generates speech through POST /api/text-to-speech.
A successful response returns a playable generated artifact with audio_path and related metadata.
The browser stores the latest generated artifact reference for the active TTS result.
The user opens the Save to Soundboard modal, chooses a category, and submits POST /api/soundboard/save-clip.
The save request includes the category, source text, requested format options, and the generated artifact reference as audio_path or artifact_path.
The backend resolves the artifact from the tracked local/static artifact path, uploads audio and text assets to the provider-backed Soundboard path, persists clip-order metadata, and returns success metadata.
The Soundboard category can be reloaded and the saved clip can be played, downloaded, reordered where supported, and reused from the Soundboard UI.

Backend and API acceptance criteria

R045 — Generate speech artifact handoff

Success criteria:

POST /api/text-to-speech returns a structured success payload that includes artifact_id, audio_path, and download_path for the generated audio artifact.
The returned audio_path or equivalent artifact reference is stable enough for a subsequent POST /api/soundboard/save-clip request in the same browser workflow.
The save endpoint accepts the generated artifact reference through artifact_path or audio_path and resolves it through the server-side artifact/download path rules.

Failure criteria:

Missing generated text, overlong text, provider generation failure, missing artifact reference, and missing text must return structured error responses rather than silent UI failure.
A save request that omits the artifact reference must fail with SPEECH_ARTIFACT_REFERENCE_REQUIRED.
A save request whose artifact reference cannot be resolved must fail with a structured SPEECH_ARTIFACT_NOT_FOUND response.

R046 — Save generated speech for Soundboard reuse

Success criteria:

POST /api/soundboard/save-clip uploads the generated audio to the selected Soundboard category using the configured storage provider path.
The save path writes the text sidecar, updates or seeds clip-order metadata, and invalidates category cache so reload observes the new clip.
The saved clip appears after Soundboard reload with enough metadata for play, download, and reuse from the existing Soundboard controls.

Failure criteria:

Unsupported output format, invalid bitrate, missing storage backend, audio upload failure, text upload failure, and clip-order persistence failure must return explicit non-success responses.
Provider upload and rollback failures must remain diagnosable through server logs and response error messages.
Cache invalidation failure must be logged but must not make a successfully saved clip look like a failed save.

UI acceptance criteria

Success criteria:

The browser keeps the latest successful /api/text-to-speech artifact reference separate from the text input value.
The Save to Soundboard request body includes audio_path or artifact_path from the latest generated response, plus category, text, format, bitrate, and normalize options.
After a successful save, the UI shows a success toast and notifies the Soundboard category so reload/reuse can surface the new clip.
Reloading the selected Soundboard category displays the saved clip and preserves existing play, download, and category actions.

Failure criteria:

If generation has not produced an artifact yet, the Save to Soundboard action must not submit a request that is missing audio_path or artifact_path.
If the save endpoint returns SPEECH_ARTIFACT_REFERENCE_REQUIRED, SPEECH_ARTIFACT_NOT_FOUND, or another structured error, the UI must show a clear failure toast and reset the Save button state.
Switching categories or reopening the modal must not reuse a stale artifact from a previous failed generation.

Failure and error acceptance

Generation validation errors remain visible as structured /api/text-to-speech failures.
Save validation errors remain visible as structured /api/soundboard/save-clip failures.
Artifact handoff bugs are treated as product failures: a user who heard the generated audio must be able to save that exact artifact, not a regenerated or implicit global last-output file.
The implementation must preserve explicit error codes for future tests and diagnostics, including SPEECH_RUNTIME_ERROR, SPEECH_ARTIFACT_REFERENCE_REQUIRED, and SPEECH_ARTIFACT_NOT_FOUND.

Observability and diagnostics

Future agents must be able to inspect the workflow through pytest failures and runtime diagnostics. Preserve these observable fields and signals:

/api/text-to-speech success payload: artifact_id, audio_path, download_path.
/api/soundboard/save-clip request payload: category, text, and audio_path or artifact_path.
Save backend diagnostics: operation name save-clip-to-soundboard, structured speech error codes, provider upload path, text sidecar path, clip-order update result, and cache invalidation log.
Soundboard reload metadata: category identifier, clip filename, clip name, audio URL, text URL where available, duration where available, and reload/reuse state.

Explicit non-goals

Do not implement native iOS or Android parity in S02.
Do not execute the storage migration or change the selected storage provider as part of this workflow target.
Do not remap the whole speech, Soundboard, storage, or native feature inventory.
Do not replace the legacy web main TTS screen or Soundboard with a new surface before proving this workflow.
Do not rely on ignored local planning artifacts, generated static artifacts, or real environment secrets for acceptance tests.