SpeakTrue

Generate Speech to Soundboard Workflow Target

Reader and post-read action

This brief is for the S02 implementation agent landing cold on the product workflow. After reading it, the agent should implement and verify the first end-to-end path where generated speech is saved into Soundboard and can be reloaded for reuse without reopening discovery.

Initial surface decision

Use the legacy web main TTS screen plus the legacy web Soundboard as the initial implementation surface for R045 and R046. This surface already has the text-to-speech API, the save-to-Soundboard API, the browser save modal, Soundboard category loading, pytest coverage, and the storage-provider seam needed to prove the user-visible workflow fastest.

Native parity, storage-provider migration execution, and additional speech surfaces are downstream work. They should consume the acceptance contract proven here rather than expand this slice into another platform inventory.

Current evidence files

Only these tracked files are needed for S02 implementation targeting:

Exact user workflow

  1. The user opens the legacy web main TTS surface.
  2. The user enters text and generates speech through POST /api/text-to-speech.
  3. A successful response returns a playable generated artifact with audio_path and related metadata.
  4. The browser stores the latest generated artifact reference for the active TTS result.
  5. The user opens the Save to Soundboard modal, chooses a category, and submits POST /api/soundboard/save-clip.
  6. The save request includes the category, source text, requested format options, and the generated artifact reference as audio_path or artifact_path.
  7. The backend resolves the artifact from the tracked local/static artifact path, uploads audio and text assets to the provider-backed Soundboard path, persists clip-order metadata, and returns success metadata.
  8. The Soundboard category can be reloaded and the saved clip can be played, downloaded, reordered where supported, and reused from the Soundboard UI.

Backend and API acceptance criteria

R045 — Generate speech artifact handoff

Success criteria:

Failure criteria:

R046 — Save generated speech for Soundboard reuse

Success criteria:

Failure criteria:

UI acceptance criteria

Success criteria:

Failure criteria:

Failure and error acceptance

Observability and diagnostics

Future agents must be able to inspect the workflow through pytest failures and runtime diagnostics. Preserve these observable fields and signals:

Explicit non-goals