songjvcheng/video_translate

Fork 0

Song367 85065cbca3

Gitea Actions Demo / Explore-Gitea-Actions (push) Successful in 1m8s

Details

Build multi-stage subtitle and dubbing pipeline

2026-03-20 20:55:40 +08:00

7.2 KiB

Raw Blame History

Volcengine ASR Stage-1 Replacement Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Replace the current Stage 1 transcription agent with Volcengine's flash ASR API using audio.data base64 while keeping the rest of the 4+1 subtitle pipeline unchanged.

Architecture: The server will extract WAV audio from uploaded video, base64-encode it, send it to Volcengine's recognize/flash endpoint, and map the result into the existing TranscriptSegment[] shape. Segmentation, Translation, Voice Matching, and Validation will continue to use their existing contracts.

Tech Stack: Node.js, Express, TypeScript, fluent-ffmpeg, Vitest, existing server job orchestration

Task 1: Add ASR Configuration Surface

Files:

Modify: E:\Downloads\ai-video-dubbing-&-translation\src\server\llmProvider.ts
Modify: E:\Downloads\ai-video-dubbing-&-translation\.env.example
Test: E:\Downloads\ai-video-dubbing-&-translation\src\server\llmProvider.test.ts

Step 1: Write the failing test

Add a test that expects Volcengine ASR config to resolve from env with:

app key
access key
resource id
submit/query URLs
timeout
polling interval

Step 2: Run test to verify it fails

Run: npm.cmd run test -- src/server/llmProvider.test.ts

Expected: FAIL because ASR config fields are missing.

Step 3: Write minimal implementation

Add environment parsing for ASR config without disturbing existing LLM provider resolution.

Step 4: Run test to verify it passes

Run: npm.cmd run test -- src/server/llmProvider.test.ts

Expected: PASS

Step 5: Commit

git add src/server/llmProvider.ts src/server/llmProvider.test.ts .env.example
git commit -m "feat: add volcengine asr config"

Task 2: Build the Volcengine Flash ASR Client

Files:

Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\volcengineAsr.ts
Test: E:\Downloads\ai-video-dubbing-&-translation\src\server\volcengineAsr.test.ts

Step 1: Write the failing test

Add tests for:

request header shape
request body with audio.data
API error code handling
success result parsing

Step 2: Run test to verify it fails

Run: npm.cmd run test -- src/server/volcengineAsr.test.ts

Expected: FAIL because the client does not exist.

Step 3: Write minimal implementation

Implement the flash recognition request/response helper.

Step 4: Run test to verify it passes

Run: npm.cmd run test -- src/server/volcengineAsr.test.ts

Expected: PASS

Step 5: Commit

git add src/server/volcengineAsr.ts src/server/volcengineAsr.test.ts
git commit -m "feat: add volcengine flash asr client"

Task 3: Map Flash ASR Result to Transcript Segments

Files:

Modify: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\transcriptionStage.ts
Modify: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\stageTypes.ts
Test: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\transcriptionStage.test.ts

Step 1: Write the failing test

Add a test that feeds a flash ASR result payload with utterances and expects normalized TranscriptSegment[].

Step 2: Run test to verify it fails

Run: npm.cmd run test -- src/server/subtitleStages/transcriptionStage.test.ts

Expected: FAIL because Stage 1 still expects model JSON output.

Step 3: Write minimal implementation

Refactor transcriptionStage.ts so Stage 1:

extracts WAV audio
base64-encodes the audio
calls volcengineAsr.ts
maps utterances into TranscriptSegment[]

Step 4: Run test to verify it passes

Run: npm.cmd run test -- src/server/subtitleStages/transcriptionStage.test.ts

Expected: PASS

Step 5: Commit

git add src/server/subtitleStages/transcriptionStage.ts src/server/subtitleStages/stageTypes.ts src/server/subtitleStages/transcriptionStage.test.ts
git commit -m "feat: switch stage 1 to flash asr"

Task 4: Wire Stage 1 Logging and Cleanup

Files:

Modify: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\transcriptionStage.ts
Modify: E:\Downloads\ai-video-dubbing-&-translation\src\server\multiStageSubtitleGeneration.ts
Test: E:\Downloads\ai-video-dubbing-&-translation\src\server\multiStageSubtitleGeneration.test.ts

Step 1: Write the failing test

Add assertions that the transcription path logs:

ASR request start and finish
API status code failures
mapped utterance summary

Step 2: Run test to verify it fails

Run: npm.cmd run test -- src/server/multiStageSubtitleGeneration.test.ts

Expected: FAIL because these log events are not present yet.

Step 3: Write minimal implementation

Add structured logs and ensure temp audio cleanup still runs after extraction.

Step 4: Run test to verify it passes

Run: npm.cmd run test -- src/server/multiStageSubtitleGeneration.test.ts

Expected: PASS

Step 5: Commit

git add src/server/subtitleStages/transcriptionStage.ts src/server/multiStageSubtitleGeneration.ts src/server/multiStageSubtitleGeneration.test.ts
git commit -m "feat: log flash asr stage lifecycle"

Task 5: Keep the Frontend Contract Stable

Files:

Modify: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\transcriptionStage.ts
Modify: E:\Downloads\ai-video-dubbing-&-translation\src\server\multiStageSubtitleGeneration.ts
Test: E:\Downloads\ai-video-dubbing-&-translation\src\server\multiStageSubtitleGeneration.test.ts

Step 1: Write the failing test

Add assertions that the transcription path logs:

ASR submit start/finish
poll lifecycle
mapped utterance summary

Step 2: Run test to verify it fails

Run: npm.cmd run test -- src/server/multiStageSubtitleGeneration.test.ts

Expected: FAIL because these log events are not present yet.

Step 3: Write minimal implementation

Add structured logs and ensure temp audio cleanup always runs after Stage 1 completes or fails.

Step 4: Run test to verify it passes

Run: npm.cmd run test -- src/server/multiStageSubtitleGeneration.test.ts

Expected: PASS

Step 5: Commit

git add src/server/subtitleStages/transcriptionStage.ts src/server/multiStageSubtitleGeneration.ts src/server/multiStageSubtitleGeneration.test.ts
git commit -m "feat: log volcengine asr stage lifecycle"

Task 6: Run Full Targeted Verification

Files:

No production file changes required

Step 1: Run targeted server and pipeline tests

Run:

npm.cmd run test -- src/server/llmProvider.test.ts src/server/tempAudioStore.test.ts src/server/volcengineAsr.test.ts src/server/subtitleStages/transcriptionStage.test.ts src/server/multiStageSubtitleGeneration.test.ts src/server/subtitleGeneration.test.ts src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx

Expected: PASS

Step 2: Restart the dev server

Restart the local dev server and confirm port 3000 is listening.

Step 3: Manual validation

Use the known failing sample video and verify:

Stage 1 logs show ASR submit/poll
originalText is closer to the actual dialogue than before
downstream translation and dubbing still complete

Step 4: Commit

git add .
git commit -m "feat: replace stage 1 transcription with flash asr"

7.2 KiB Raw Blame History

Volcengine ASR Stage-1 Replacement Implementation Plan

Task 1: Add ASR Configuration Surface

Task 2: Build the Volcengine Flash ASR Client

Task 3: Map Flash ASR Result to Transcript Segments

Task 4: Wire Stage 1 Logging and Cleanup

Task 5: Keep the Frontend Contract Stable

Task 6: Run Full Targeted Verification

7.2 KiB

Raw Blame History