7.2 KiB
Volcengine ASR Stage-1 Replacement Implementation Plan
For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Goal: Replace the current Stage 1 transcription agent with Volcengine's flash ASR API using audio.data base64 while keeping the rest of the 4+1 subtitle pipeline unchanged.
Architecture: The server will extract WAV audio from uploaded video, base64-encode it, send it to Volcengine's recognize/flash endpoint, and map the result into the existing TranscriptSegment[] shape. Segmentation, Translation, Voice Matching, and Validation will continue to use their existing contracts.
Tech Stack: Node.js, Express, TypeScript, fluent-ffmpeg, Vitest, existing server job orchestration
Task 1: Add ASR Configuration Surface
Files:
- Modify:
E:\Downloads\ai-video-dubbing-&-translation\src\server\llmProvider.ts - Modify:
E:\Downloads\ai-video-dubbing-&-translation\.env.example - Test:
E:\Downloads\ai-video-dubbing-&-translation\src\server\llmProvider.test.ts
Step 1: Write the failing test
Add a test that expects Volcengine ASR config to resolve from env with:
- app key
- access key
- resource id
- submit/query URLs
- timeout
- polling interval
Step 2: Run test to verify it fails
Run: npm.cmd run test -- src/server/llmProvider.test.ts
Expected: FAIL because ASR config fields are missing.
Step 3: Write minimal implementation
Add environment parsing for ASR config without disturbing existing LLM provider resolution.
Step 4: Run test to verify it passes
Run: npm.cmd run test -- src/server/llmProvider.test.ts
Expected: PASS
Step 5: Commit
git add src/server/llmProvider.ts src/server/llmProvider.test.ts .env.example
git commit -m "feat: add volcengine asr config"
Task 2: Build the Volcengine Flash ASR Client
Files:
- Create:
E:\Downloads\ai-video-dubbing-&-translation\src\server\volcengineAsr.ts - Test:
E:\Downloads\ai-video-dubbing-&-translation\src\server\volcengineAsr.test.ts
Step 1: Write the failing test
Add tests for:
- request header shape
- request body with
audio.data - API error code handling
- success result parsing
Step 2: Run test to verify it fails
Run: npm.cmd run test -- src/server/volcengineAsr.test.ts
Expected: FAIL because the client does not exist.
Step 3: Write minimal implementation
Implement the flash recognition request/response helper.
Step 4: Run test to verify it passes
Run: npm.cmd run test -- src/server/volcengineAsr.test.ts
Expected: PASS
Step 5: Commit
git add src/server/volcengineAsr.ts src/server/volcengineAsr.test.ts
git commit -m "feat: add volcengine flash asr client"
Task 3: Map Flash ASR Result to Transcript Segments
Files:
- Modify:
E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\transcriptionStage.ts - Modify:
E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\stageTypes.ts - Test:
E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\transcriptionStage.test.ts
Step 1: Write the failing test
Add a test that feeds a flash ASR result payload with utterances and expects normalized TranscriptSegment[].
Step 2: Run test to verify it fails
Run: npm.cmd run test -- src/server/subtitleStages/transcriptionStage.test.ts
Expected: FAIL because Stage 1 still expects model JSON output.
Step 3: Write minimal implementation
Refactor transcriptionStage.ts so Stage 1:
- extracts WAV audio
- base64-encodes the audio
- calls
volcengineAsr.ts - maps
utterancesintoTranscriptSegment[]
Step 4: Run test to verify it passes
Run: npm.cmd run test -- src/server/subtitleStages/transcriptionStage.test.ts
Expected: PASS
Step 5: Commit
git add src/server/subtitleStages/transcriptionStage.ts src/server/subtitleStages/stageTypes.ts src/server/subtitleStages/transcriptionStage.test.ts
git commit -m "feat: switch stage 1 to flash asr"
Task 4: Wire Stage 1 Logging and Cleanup
Files:
- Modify:
E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\transcriptionStage.ts - Modify:
E:\Downloads\ai-video-dubbing-&-translation\src\server\multiStageSubtitleGeneration.ts - Test:
E:\Downloads\ai-video-dubbing-&-translation\src\server\multiStageSubtitleGeneration.test.ts
Step 1: Write the failing test
Add assertions that the transcription path logs:
- ASR request start and finish
- API status code failures
- mapped utterance summary
Step 2: Run test to verify it fails
Run: npm.cmd run test -- src/server/multiStageSubtitleGeneration.test.ts
Expected: FAIL because these log events are not present yet.
Step 3: Write minimal implementation
Add structured logs and ensure temp audio cleanup still runs after extraction.
Step 4: Run test to verify it passes
Run: npm.cmd run test -- src/server/multiStageSubtitleGeneration.test.ts
Expected: PASS
Step 5: Commit
git add src/server/subtitleStages/transcriptionStage.ts src/server/multiStageSubtitleGeneration.ts src/server/multiStageSubtitleGeneration.test.ts
git commit -m "feat: log flash asr stage lifecycle"
Task 5: Keep the Frontend Contract Stable
Files:
- Modify:
E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\transcriptionStage.ts - Modify:
E:\Downloads\ai-video-dubbing-&-translation\src\server\multiStageSubtitleGeneration.ts - Test:
E:\Downloads\ai-video-dubbing-&-translation\src\server\multiStageSubtitleGeneration.test.ts
Step 1: Write the failing test
Add assertions that the transcription path logs:
- ASR submit start/finish
- poll lifecycle
- mapped utterance summary
Step 2: Run test to verify it fails
Run: npm.cmd run test -- src/server/multiStageSubtitleGeneration.test.ts
Expected: FAIL because these log events are not present yet.
Step 3: Write minimal implementation
Add structured logs and ensure temp audio cleanup always runs after Stage 1 completes or fails.
Step 4: Run test to verify it passes
Run: npm.cmd run test -- src/server/multiStageSubtitleGeneration.test.ts
Expected: PASS
Step 5: Commit
git add src/server/subtitleStages/transcriptionStage.ts src/server/multiStageSubtitleGeneration.ts src/server/multiStageSubtitleGeneration.test.ts
git commit -m "feat: log volcengine asr stage lifecycle"
Task 6: Run Full Targeted Verification
Files:
- No production file changes required
Step 1: Run targeted server and pipeline tests
Run:
npm.cmd run test -- src/server/llmProvider.test.ts src/server/tempAudioStore.test.ts src/server/volcengineAsr.test.ts src/server/subtitleStages/transcriptionStage.test.ts src/server/multiStageSubtitleGeneration.test.ts src/server/subtitleGeneration.test.ts src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx
Expected: PASS
Step 2: Restart the dev server
Restart the local dev server and confirm port 3000 is listening.
Step 3: Manual validation
Use the known failing sample video and verify:
- Stage 1 logs show ASR submit/poll
originalTextis closer to the actual dialogue than before- downstream translation and dubbing still complete
Step 4: Commit
git add .
git commit -m "feat: replace stage 1 transcription with flash asr"