# Volcengine ASR Stage-1 Replacement Implementation Plan > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. **Goal:** Replace the current Stage 1 transcription agent with Volcengine's flash ASR API using `audio.data` base64 while keeping the rest of the `4+1` subtitle pipeline unchanged. **Architecture:** The server will extract WAV audio from uploaded video, base64-encode it, send it to Volcengine's `recognize/flash` endpoint, and map the result into the existing `TranscriptSegment[]` shape. `Segmentation`, `Translation`, `Voice Matching`, and `Validation` will continue to use their existing contracts. **Tech Stack:** Node.js, Express, TypeScript, fluent-ffmpeg, Vitest, existing server job orchestration --- ### Task 1: Add ASR Configuration Surface **Files:** - Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\llmProvider.ts` - Modify: `E:\Downloads\ai-video-dubbing-&-translation\.env.example` - Test: `E:\Downloads\ai-video-dubbing-&-translation\src\server\llmProvider.test.ts` **Step 1: Write the failing test** Add a test that expects Volcengine ASR config to resolve from env with: - app key - access key - resource id - submit/query URLs - timeout - polling interval **Step 2: Run test to verify it fails** Run: `npm.cmd run test -- src/server/llmProvider.test.ts` Expected: FAIL because ASR config fields are missing. **Step 3: Write minimal implementation** Add environment parsing for ASR config without disturbing existing LLM provider resolution. **Step 4: Run test to verify it passes** Run: `npm.cmd run test -- src/server/llmProvider.test.ts` Expected: PASS **Step 5: Commit** ```bash git add src/server/llmProvider.ts src/server/llmProvider.test.ts .env.example git commit -m "feat: add volcengine asr config" ``` ### Task 2: Build the Volcengine Flash ASR Client **Files:** - Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\volcengineAsr.ts` - Test: `E:\Downloads\ai-video-dubbing-&-translation\src\server\volcengineAsr.test.ts` **Step 1: Write the failing test** Add tests for: - request header shape - request body with `audio.data` - API error code handling - success result parsing **Step 2: Run test to verify it fails** Run: `npm.cmd run test -- src/server/volcengineAsr.test.ts` Expected: FAIL because the client does not exist. **Step 3: Write minimal implementation** Implement the flash recognition request/response helper. **Step 4: Run test to verify it passes** Run: `npm.cmd run test -- src/server/volcengineAsr.test.ts` Expected: PASS **Step 5: Commit** ```bash git add src/server/volcengineAsr.ts src/server/volcengineAsr.test.ts git commit -m "feat: add volcengine flash asr client" ``` ### Task 3: Map Flash ASR Result to Transcript Segments **Files:** - Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\transcriptionStage.ts` - Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\stageTypes.ts` - Test: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\transcriptionStage.test.ts` **Step 1: Write the failing test** Add a test that feeds a flash ASR result payload with `utterances` and expects normalized `TranscriptSegment[]`. **Step 2: Run test to verify it fails** Run: `npm.cmd run test -- src/server/subtitleStages/transcriptionStage.test.ts` Expected: FAIL because Stage 1 still expects model JSON output. **Step 3: Write minimal implementation** Refactor `transcriptionStage.ts` so Stage 1: - extracts WAV audio - base64-encodes the audio - calls `volcengineAsr.ts` - maps `utterances` into `TranscriptSegment[]` **Step 4: Run test to verify it passes** Run: `npm.cmd run test -- src/server/subtitleStages/transcriptionStage.test.ts` Expected: PASS **Step 5: Commit** ```bash git add src/server/subtitleStages/transcriptionStage.ts src/server/subtitleStages/stageTypes.ts src/server/subtitleStages/transcriptionStage.test.ts git commit -m "feat: switch stage 1 to flash asr" ``` ### Task 4: Wire Stage 1 Logging and Cleanup **Files:** - Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\transcriptionStage.ts` - Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\multiStageSubtitleGeneration.ts` - Test: `E:\Downloads\ai-video-dubbing-&-translation\src\server\multiStageSubtitleGeneration.test.ts` **Step 1: Write the failing test** Add assertions that the transcription path logs: - ASR request start and finish - API status code failures - mapped utterance summary **Step 2: Run test to verify it fails** Run: `npm.cmd run test -- src/server/multiStageSubtitleGeneration.test.ts` Expected: FAIL because these log events are not present yet. **Step 3: Write minimal implementation** Add structured logs and ensure temp audio cleanup still runs after extraction. **Step 4: Run test to verify it passes** Run: `npm.cmd run test -- src/server/multiStageSubtitleGeneration.test.ts` Expected: PASS **Step 5: Commit** ```bash git add src/server/subtitleStages/transcriptionStage.ts src/server/multiStageSubtitleGeneration.ts src/server/multiStageSubtitleGeneration.test.ts git commit -m "feat: log flash asr stage lifecycle" ``` ### Task 5: Keep the Frontend Contract Stable **Files:** - Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\transcriptionStage.ts` - Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\multiStageSubtitleGeneration.ts` - Test: `E:\Downloads\ai-video-dubbing-&-translation\src\server\multiStageSubtitleGeneration.test.ts` **Step 1: Write the failing test** Add assertions that the transcription path logs: - ASR submit start/finish - poll lifecycle - mapped utterance summary **Step 2: Run test to verify it fails** Run: `npm.cmd run test -- src/server/multiStageSubtitleGeneration.test.ts` Expected: FAIL because these log events are not present yet. **Step 3: Write minimal implementation** Add structured logs and ensure temp audio cleanup always runs after Stage 1 completes or fails. **Step 4: Run test to verify it passes** Run: `npm.cmd run test -- src/server/multiStageSubtitleGeneration.test.ts` Expected: PASS **Step 5: Commit** ```bash git add src/server/subtitleStages/transcriptionStage.ts src/server/multiStageSubtitleGeneration.ts src/server/multiStageSubtitleGeneration.test.ts git commit -m "feat: log volcengine asr stage lifecycle" ``` ### Task 6: Run Full Targeted Verification **Files:** - No production file changes required **Step 1: Run targeted server and pipeline tests** Run: ```bash npm.cmd run test -- src/server/llmProvider.test.ts src/server/tempAudioStore.test.ts src/server/volcengineAsr.test.ts src/server/subtitleStages/transcriptionStage.test.ts src/server/multiStageSubtitleGeneration.test.ts src/server/subtitleGeneration.test.ts src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx ``` Expected: PASS **Step 2: Restart the dev server** Restart the local dev server and confirm port `3000` is listening. **Step 3: Manual validation** Use the known failing sample video and verify: - Stage 1 logs show ASR submit/poll - `originalText` is closer to the actual dialogue than before - downstream translation and dubbing still complete **Step 4: Commit** ```bash git add . git commit -m "feat: replace stage 1 transcription with flash asr" ```