All checks were successful
Gitea Actions Demo / Explore-Gitea-Actions (push) Successful in 1m8s
240 lines
7.2 KiB
Markdown
240 lines
7.2 KiB
Markdown
# Volcengine ASR Stage-1 Replacement Implementation Plan
|
|
|
|
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
|
|
|
**Goal:** Replace the current Stage 1 transcription agent with Volcengine's flash ASR API using `audio.data` base64 while keeping the rest of the `4+1` subtitle pipeline unchanged.
|
|
|
|
**Architecture:** The server will extract WAV audio from uploaded video, base64-encode it, send it to Volcengine's `recognize/flash` endpoint, and map the result into the existing `TranscriptSegment[]` shape. `Segmentation`, `Translation`, `Voice Matching`, and `Validation` will continue to use their existing contracts.
|
|
|
|
**Tech Stack:** Node.js, Express, TypeScript, fluent-ffmpeg, Vitest, existing server job orchestration
|
|
|
|
---
|
|
|
|
### Task 1: Add ASR Configuration Surface
|
|
|
|
**Files:**
|
|
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\llmProvider.ts`
|
|
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\.env.example`
|
|
- Test: `E:\Downloads\ai-video-dubbing-&-translation\src\server\llmProvider.test.ts`
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
Add a test that expects Volcengine ASR config to resolve from env with:
|
|
|
|
- app key
|
|
- access key
|
|
- resource id
|
|
- submit/query URLs
|
|
- timeout
|
|
- polling interval
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `npm.cmd run test -- src/server/llmProvider.test.ts`
|
|
|
|
Expected: FAIL because ASR config fields are missing.
|
|
|
|
**Step 3: Write minimal implementation**
|
|
|
|
Add environment parsing for ASR config without disturbing existing LLM provider resolution.
|
|
|
|
**Step 4: Run test to verify it passes**
|
|
|
|
Run: `npm.cmd run test -- src/server/llmProvider.test.ts`
|
|
|
|
Expected: PASS
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/server/llmProvider.ts src/server/llmProvider.test.ts .env.example
|
|
git commit -m "feat: add volcengine asr config"
|
|
```
|
|
|
|
### Task 2: Build the Volcengine Flash ASR Client
|
|
|
|
**Files:**
|
|
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\volcengineAsr.ts`
|
|
- Test: `E:\Downloads\ai-video-dubbing-&-translation\src\server\volcengineAsr.test.ts`
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
Add tests for:
|
|
|
|
- request header shape
|
|
- request body with `audio.data`
|
|
- API error code handling
|
|
- success result parsing
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `npm.cmd run test -- src/server/volcengineAsr.test.ts`
|
|
|
|
Expected: FAIL because the client does not exist.
|
|
|
|
**Step 3: Write minimal implementation**
|
|
|
|
Implement the flash recognition request/response helper.
|
|
|
|
**Step 4: Run test to verify it passes**
|
|
|
|
Run: `npm.cmd run test -- src/server/volcengineAsr.test.ts`
|
|
|
|
Expected: PASS
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/server/volcengineAsr.ts src/server/volcengineAsr.test.ts
|
|
git commit -m "feat: add volcengine flash asr client"
|
|
```
|
|
|
|
### Task 3: Map Flash ASR Result to Transcript Segments
|
|
|
|
**Files:**
|
|
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\transcriptionStage.ts`
|
|
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\stageTypes.ts`
|
|
- Test: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\transcriptionStage.test.ts`
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
Add a test that feeds a flash ASR result payload with `utterances` and expects normalized `TranscriptSegment[]`.
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `npm.cmd run test -- src/server/subtitleStages/transcriptionStage.test.ts`
|
|
|
|
Expected: FAIL because Stage 1 still expects model JSON output.
|
|
|
|
**Step 3: Write minimal implementation**
|
|
|
|
Refactor `transcriptionStage.ts` so Stage 1:
|
|
|
|
- extracts WAV audio
|
|
- base64-encodes the audio
|
|
- calls `volcengineAsr.ts`
|
|
- maps `utterances` into `TranscriptSegment[]`
|
|
|
|
**Step 4: Run test to verify it passes**
|
|
|
|
Run: `npm.cmd run test -- src/server/subtitleStages/transcriptionStage.test.ts`
|
|
|
|
Expected: PASS
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/server/subtitleStages/transcriptionStage.ts src/server/subtitleStages/stageTypes.ts src/server/subtitleStages/transcriptionStage.test.ts
|
|
git commit -m "feat: switch stage 1 to flash asr"
|
|
```
|
|
|
|
### Task 4: Wire Stage 1 Logging and Cleanup
|
|
|
|
**Files:**
|
|
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\transcriptionStage.ts`
|
|
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\multiStageSubtitleGeneration.ts`
|
|
- Test: `E:\Downloads\ai-video-dubbing-&-translation\src\server\multiStageSubtitleGeneration.test.ts`
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
Add assertions that the transcription path logs:
|
|
|
|
- ASR request start and finish
|
|
- API status code failures
|
|
- mapped utterance summary
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `npm.cmd run test -- src/server/multiStageSubtitleGeneration.test.ts`
|
|
|
|
Expected: FAIL because these log events are not present yet.
|
|
|
|
**Step 3: Write minimal implementation**
|
|
|
|
Add structured logs and ensure temp audio cleanup still runs after extraction.
|
|
|
|
**Step 4: Run test to verify it passes**
|
|
|
|
Run: `npm.cmd run test -- src/server/multiStageSubtitleGeneration.test.ts`
|
|
|
|
Expected: PASS
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/server/subtitleStages/transcriptionStage.ts src/server/multiStageSubtitleGeneration.ts src/server/multiStageSubtitleGeneration.test.ts
|
|
git commit -m "feat: log flash asr stage lifecycle"
|
|
```
|
|
|
|
### Task 5: Keep the Frontend Contract Stable
|
|
|
|
**Files:**
|
|
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\transcriptionStage.ts`
|
|
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\multiStageSubtitleGeneration.ts`
|
|
- Test: `E:\Downloads\ai-video-dubbing-&-translation\src\server\multiStageSubtitleGeneration.test.ts`
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
Add assertions that the transcription path logs:
|
|
|
|
- ASR submit start/finish
|
|
- poll lifecycle
|
|
- mapped utterance summary
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `npm.cmd run test -- src/server/multiStageSubtitleGeneration.test.ts`
|
|
|
|
Expected: FAIL because these log events are not present yet.
|
|
|
|
**Step 3: Write minimal implementation**
|
|
|
|
Add structured logs and ensure temp audio cleanup always runs after Stage 1 completes or fails.
|
|
|
|
**Step 4: Run test to verify it passes**
|
|
|
|
Run: `npm.cmd run test -- src/server/multiStageSubtitleGeneration.test.ts`
|
|
|
|
Expected: PASS
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/server/subtitleStages/transcriptionStage.ts src/server/multiStageSubtitleGeneration.ts src/server/multiStageSubtitleGeneration.test.ts
|
|
git commit -m "feat: log volcengine asr stage lifecycle"
|
|
```
|
|
|
|
### Task 6: Run Full Targeted Verification
|
|
|
|
**Files:**
|
|
- No production file changes required
|
|
|
|
**Step 1: Run targeted server and pipeline tests**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
npm.cmd run test -- src/server/llmProvider.test.ts src/server/tempAudioStore.test.ts src/server/volcengineAsr.test.ts src/server/subtitleStages/transcriptionStage.test.ts src/server/multiStageSubtitleGeneration.test.ts src/server/subtitleGeneration.test.ts src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx
|
|
```
|
|
|
|
Expected: PASS
|
|
|
|
**Step 2: Restart the dev server**
|
|
|
|
Restart the local dev server and confirm port `3000` is listening.
|
|
|
|
**Step 3: Manual validation**
|
|
|
|
Use the known failing sample video and verify:
|
|
|
|
- Stage 1 logs show ASR submit/poll
|
|
- `originalText` is closer to the actual dialogue than before
|
|
- downstream translation and dubbing still complete
|
|
|
|
**Step 4: Commit**
|
|
|
|
```bash
|
|
git add .
|
|
git commit -m "feat: replace stage 1 transcription with flash asr"
|
|
```
|