video_translate/docs/plans/2026-03-19-4-plus-1-subtitle-pipeline.md
Song367 85065cbca3
All checks were successful
Gitea Actions Demo / Explore-Gitea-Actions (push) Successful in 1m8s
Build multi-stage subtitle and dubbing pipeline
2026-03-20 20:55:40 +08:00

9.4 KiB

4+1 Subtitle Pipeline Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Replace the current one-shot subtitle generation flow with a 4+1 staged pipeline that isolates transcription, segmentation, translation, voice matching, and validation.

Architecture: Introduce stage-specific server modules under a new subtitleStages folder and route the existing /generate-subtitles backend entry point through a new orchestrator. Keep the final SubtitlePipelineResult contract stable for the editor while adding internal stage contracts and diagnostics.

Tech Stack: TypeScript, Node server pipeline, Vitest, React client services, async subtitle job polling


Task 1: Define stage contracts and lock them with tests

Files:

  • Modify: E:\Downloads\ai-video-dubbing-&-translation\src\types.ts
  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\stageTypes.ts
  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\stageTypes.test.ts

Step 1: Write the failing test

  • Assert stage types support:
    • transcription output with confidence and needsReview
    • translated output with ttsText and ttsLanguage
    • validation issue output with code and severity

Step 2: Run test to verify it fails

Run: npm run test -- src/server/subtitleStages/stageTypes.test.ts

Expected: FAIL because the new stage files and contracts do not exist yet.

Step 3: Write minimal implementation

  • Create stageTypes.ts with:
    • TranscriptSegment
    • SegmentedSubtitle
    • TranslatedSubtitle
    • VoiceMatchedSubtitle
    • ValidationIssue
    • any stage diagnostics helpers needed by the orchestrator
  • Extend src/types.ts only where the public result contract needs optional diagnostics.

Step 4: Run test to verify it passes

Run: npm run test -- src/server/subtitleStages/stageTypes.test.ts

Expected: PASS

Step 5: Commit

Skip commit for now.

Task 2: Add the transcription stage

Files:

  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\transcriptionStage.ts
  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\transcriptionStage.test.ts
  • Modify: E:\Downloads\ai-video-dubbing-&-translation\src\server\videoSubtitleGeneration.ts

Step 1: Write the failing tests

  • Assert the transcription stage prompt asks only for:
    • faithful transcription
    • timestamps
    • speaker metadata
  • Assert it does not request translation or voice selection.
  • Assert parser output normalizes low-confidence and missing speaker fields safely.

Step 2: Run tests to verify they fail

Run: npm run test -- src/server/subtitleStages/transcriptionStage.test.ts src/server/videoSubtitleGeneration.test.ts

Expected: FAIL because the transcription stage does not exist and current prompt is still all-in-one.

Step 3: Write minimal implementation

  • Extract provider-specific transcription logic from videoSubtitleGeneration.ts into transcriptionStage.ts.
  • Narrow the transcription prompt and JSON schema to transcription-only fields.
  • Return TranscriptSegment[].

Step 4: Run tests to verify they pass

Run: npm run test -- src/server/subtitleStages/transcriptionStage.test.ts src/server/videoSubtitleGeneration.test.ts

Expected: PASS

Step 5: Commit

Skip commit for now.

Task 3: Add the segmentation stage

Files:

  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\segmentationStage.ts
  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\segmentationStage.test.ts
  • Modify: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitlePipeline.ts

Step 1: Write the failing tests

  • Assert long transcript segments are split into subtitle-friendly chunks.
  • Assert segmentation preserves originalText, timing order, and speaker identity.
  • Assert no paraphrasing occurs during segmentation.

Step 2: Run tests to verify they fail

Run: npm run test -- src/server/subtitleStages/segmentationStage.test.ts src/server/subtitlePipeline.test.ts

Expected: FAIL because there is no explicit segmentation stage.

Step 3: Write minimal implementation

  • Reuse normalization helpers from subtitlePipeline.ts.
  • Implement deterministic segmentation that:
    • preserves chronology
    • keeps original text intact
    • marks impossible cases for later review instead of rewriting

Step 4: Run tests to verify it passes

Run: npm run test -- src/server/subtitleStages/segmentationStage.test.ts src/server/subtitlePipeline.test.ts

Expected: PASS

Step 5: Commit

Skip commit for now.

Task 4: Add the translation stage

Files:

  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\translationStage.ts
  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\translationStage.test.ts
  • Modify: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleGeneration.ts

Step 1: Write the failing tests

  • Assert translation stage input is originalText from segmentation, not raw provider output.
  • Assert it returns:
    • translatedText
    • ttsText
    • ttsLanguage
  • Assert it never changes timestamps.

Step 2: Run tests to verify they fail

Run: npm run test -- src/server/subtitleStages/translationStage.test.ts src/server/subtitleGeneration.test.ts

Expected: FAIL because translation is not separated yet.

Step 3: Write minimal implementation

  • Build a translation-only stage that consumes segmented subtitles.
  • Keep English subtitle generation and TTS-language generation separate but paired.
  • Return TranslatedSubtitle[].

Step 4: Run tests to verify they pass

Run: npm run test -- src/server/subtitleStages/translationStage.test.ts src/server/subtitleGeneration.test.ts

Expected: PASS

Step 5: Commit

Skip commit for now.

Task 5: Add voice matching and validation stages

Files:

  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\voiceMatchingStage.ts
  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\voiceMatchingStage.test.ts
  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\validationStage.ts
  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleStages\validationStage.test.ts
  • Modify: E:\Downloads\ai-video-dubbing-&-translation\src\voices.ts

Step 1: Write the failing tests

  • Assert voice matching only picks from the current language-specific catalog.
  • Assert it falls back safely when gender or speaker tone is missing.
  • Assert validation returns warnings for:
    • low confidence transcript
    • voice language mismatch
    • empty translation
    • timing overlap

Step 2: Run tests to verify they fail

Run: npm run test -- src/server/subtitleStages/voiceMatchingStage.test.ts src/server/subtitleStages/validationStage.test.ts

Expected: FAIL because neither stage exists yet.

Step 3: Write minimal implementation

  • Implement a pure voice matcher that adds voiceId and never rewrites text.
  • Implement a validator that inspects final subtitles and returns ValidationIssue[].

Step 4: Run tests to verify they pass

Run: npm run test -- src/server/subtitleStages/voiceMatchingStage.test.ts src/server/subtitleStages/validationStage.test.ts

Expected: PASS

Step 5: Commit

Skip commit for now.

Task 6: Integrate the orchestrator and async job progress

Files:

  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\multiStageSubtitleGeneration.ts
  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\multiStageSubtitleGeneration.test.ts
  • Modify: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleGeneration.ts
  • Modify: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleJobs.ts
  • Modify: E:\Downloads\ai-video-dubbing-&-translation\src\services\subtitleService.ts
  • Modify: E:\Downloads\ai-video-dubbing-&-translation\src\types.ts
  • Modify: E:\Downloads\ai-video-dubbing-&-translation\server.ts

Step 1: Write the failing tests

  • Assert the orchestrator runs stages in order:
    • transcription
    • segmentation
    • translation
    • voice matching
    • validation
  • Assert async progress updates expose stage-specific messages.
  • Assert final SubtitlePipelineResult stays backward compatible for the editor.

Step 2: Run tests to verify they fail

Run: npm run test -- src/server/multiStageSubtitleGeneration.test.ts src/server/subtitleJobs.test.ts src/services/subtitleService.test.ts

Expected: FAIL because the orchestrator and new stage progress do not exist yet.

Step 3: Write minimal implementation

  • Add multiStageSubtitleGeneration.ts.
  • Route existing backend entry points through the orchestrator.
  • Keep /generate-subtitles and polling payloads stable.
  • Include optional validation diagnostics in the final result.

Step 4: Run tests to verify they pass

Run: npm run test -- src/server/multiStageSubtitleGeneration.test.ts src/server/subtitleJobs.test.ts src/services/subtitleService.test.ts

Expected: PASS

Step 5: Run focused regression tests

Run: npm run test -- src/server/videoSubtitleGeneration.test.ts src/server/subtitleGeneration.test.ts src/server/subtitleJobs.test.ts src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx

Expected: PASS

Step 6: Commit

Skip commit for now.