# Precise Dialogue Localization Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Build a high-precision subtitle pipeline that returns accurate sentence boundaries, word-level timings, and real speaker attribution while preserving the current editor flow.
**Architecture:** Keep the React app and `server.ts` as the public entry points, but move timing-critical work into a dedicated alignment adapter. The backend normalizes aligned words into sentence subtitles, translates text without changing timing, and returns quality metadata so the editor can enable or disable precision UI safely.
**Tech Stack:** React 19, TypeScript, Vite, Express, FFmpeg, OpenAI SDK, a new test runner (`vitest`), and a high-precision alignment backend adapter.
---
### Task 1: Add Test Infrastructure
**Files:**
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\package.json`
- Create: `E:\Downloads\ai-video-dubbing-&-translation\vitest.config.ts`
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\test\setup.ts`
**Step 1: Write the failing test**
Create a minimal smoke test first so the test runner has a real target.
```ts
import { describe, expect, it } from 'vitest';
describe('test harness', () => {
it('runs vitest in this workspace', () => {
expect(true).toBe(true);
});
});
```
**Step 2: Run test to verify it fails**
Run: `npm test -- --run`
Expected: FAIL because no `test` script or Vitest config exists yet.
**Step 3: Write minimal implementation**
1. Add `test` and `test:watch` scripts to `package.json`.
2. Add dev dependencies for `vitest`.
3. Create `vitest.config.ts` with a Node environment default.
4. Add `src/test/setup.ts` for shared setup.
```ts
import { defineConfig } from 'vitest/config';
export default defineConfig({
test: {
environment: 'node',
setupFiles: ['./src/test/setup.ts'],
},
});
```
**Step 4: Run test to verify it passes**
Run: `npm test -- --run`
Expected: PASS with the smoke test.
**Step 5: Commit**
```bash
git add package.json vitest.config.ts src/test/setup.ts
git commit -m "test: add vitest infrastructure"
```
### Task 2: Extract Subtitle Pipeline Types and Normalizers
**Files:**
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\types.ts`
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\subtitlePipeline.ts`
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\subtitlePipeline.test.ts`
**Step 1: Write the failing test**
Write tests for normalization from aligned word payloads to UI-ready subtitles.
```ts
it('derives subtitle boundaries from first and last word', () => {
const result = normalizeAlignedSentence({
id: 's1',
speakerId: 'spk_0',
words: [
{ text: 'Hello', startTime: 1.2, endTime: 1.5, speakerId: 'spk_0', confidence: 0.99 },
{ text: 'world', startTime: 1.6, endTime: 2.0, speakerId: 'spk_0', confidence: 0.98 },
],
originalText: 'Hello world',
translatedText: '你好世界',
});
expect(result.startTime).toBe(1.2);
expect(result.endTime).toBe(2.0);
});
```
**Step 2: Run test to verify it fails**
Run: `npm test -- --run src/lib/subtitlePipeline.test.ts`
Expected: FAIL because the new module and extended types do not exist.
**Step 3: Write minimal implementation**
1. Extend `Subtitle` in `src/types.ts` with `speakerId`, `words`, and `confidence`.
2. Create a pure helper module that normalizes backend payloads into frontend subtitles.
```ts
export const deriveSubtitleBounds = (words: WordTiming[]) => ({
startTime: words[0]?.startTime ?? 0,
endTime: words[words.length - 1]?.endTime ?? 0,
});
```
**Step 4: Run test to verify it passes**
Run: `npm test -- --run src/lib/subtitlePipeline.test.ts`
Expected: PASS.
**Step 5: Commit**
```bash
git add src/types.ts src/lib/subtitlePipeline.ts src/lib/subtitlePipeline.test.ts
git commit -m "feat: add subtitle pipeline normalizers"
```
### Task 3: Implement Sentence Reconstruction Helpers
**Files:**
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\alignment\sentenceReconstruction.ts`
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\alignment\sentenceReconstruction.test.ts`
**Step 1: Write the failing test**
Cover pause splitting and speaker splitting.
```ts
it('splits sentences when speaker changes', () => {
const result = rebuildSentences([
{ text: 'Hi', startTime: 0.0, endTime: 0.2, speakerId: 'spk_0', confidence: 0.9 },
{ text: 'there', startTime: 0.25, endTime: 0.5, speakerId: 'spk_0', confidence: 0.9 },
{ text: 'no', startTime: 0.55, endTime: 0.7, speakerId: 'spk_1', confidence: 0.9 },
]);
expect(result).toHaveLength(2);
});
```
**Step 2: Run test to verify it fails**
Run: `npm test -- --run src/lib/alignment/sentenceReconstruction.test.ts`
Expected: FAIL because the helper module is missing.
**Step 3: Write minimal implementation**
Implement pure splitting rules:
1. Split on `speakerId` change.
2. Split when word gaps exceed `0.45`.
3. Split when sentence duration exceeds `8`.
```ts
if (nextWord.speakerId !== currentSpeakerId) {
flushSentence();
}
```
**Step 4: Run test to verify it passes**
Run: `npm test -- --run src/lib/alignment/sentenceReconstruction.test.ts`
Expected: PASS.
**Step 5: Commit**
```bash
git add src/lib/alignment/sentenceReconstruction.ts src/lib/alignment/sentenceReconstruction.test.ts
git commit -m "feat: add sentence reconstruction rules"
```
### Task 4: Implement Speaker Assignment Helpers
**Files:**
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\alignment\speakerAssignment.ts`
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\alignment\speakerAssignment.test.ts`
**Step 1: Write the failing test**
Test overlap-based speaker assignment.
```ts
it('assigns each word to the speaker segment with maximum overlap', () => {
const word = { text: 'hello', startTime: 1.0, endTime: 1.4 };
const speakers = [
{ speakerId: 'spk_0', startTime: 0.8, endTime: 1.1 },
{ speakerId: 'spk_1', startTime: 1.1, endTime: 1.6 },
];
expect(assignSpeakerToWord(word, speakers)).toBe('spk_1');
});
```
**Step 2: Run test to verify it fails**
Run: `npm test -- --run src/lib/alignment/speakerAssignment.test.ts`
Expected: FAIL because speaker assignment logic does not exist.
**Step 3: Write minimal implementation**
Add a pure overlap calculator and default to `unknown` when no segment overlaps.
```ts
const overlap = Math.max(
0,
Math.min(word.endTime, segment.endTime) - Math.max(word.startTime, segment.startTime),
);
```
**Step 4: Run test to verify it passes**
Run: `npm test -- --run src/lib/alignment/speakerAssignment.test.ts`
Expected: PASS.
**Step 5: Commit**
```bash
git add src/lib/alignment/speakerAssignment.ts src/lib/alignment/speakerAssignment.test.ts
git commit -m "feat: add speaker assignment helpers"
```
### Task 5: Isolate Backend Pipeline Logic from `server.ts`
**Files:**
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitlePipeline.ts`
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitlePipeline.test.ts`
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\server.ts`
**Step 1: Write the failing test**
Add tests for orchestration-level fallback behavior.
```ts
it('returns partial quality when diarization is unavailable', async () => {
const result = await buildSubtitlePayload({
alignmentResult: {
words: [{ text: 'hi', startTime: 0, endTime: 0.2, speakerId: 'unknown', confidence: 0.9 }],
speakerSegments: [],
quality: 'partial',
},
});
expect(result.quality).toBe('partial');
});
```
**Step 2: Run test to verify it fails**
Run: `npm test -- --run src/server/subtitlePipeline.test.ts`
Expected: FAIL because orchestration code is still embedded in `server.ts`.
**Step 3: Write minimal implementation**
1. Move payload-building logic into `src/server/subtitlePipeline.ts`.
2. Make `server.ts` call the helper and only handle HTTP concerns.
```ts
export const buildSubtitlePayload = async (deps: SubtitlePipelineDeps) => {
// normalize alignment result
// translate text
// return { subtitles, speakers, quality, ... }
};
```
**Step 4: Run test to verify it passes**
Run: `npm test -- --run src/server/subtitlePipeline.test.ts`
Expected: PASS.
**Step 5: Commit**
```bash
git add src/server/subtitlePipeline.ts src/server/subtitlePipeline.test.ts server.ts
git commit -m "refactor: isolate subtitle pipeline orchestration"
```
### Task 6: Add an Alignment Service Adapter
**Files:**
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\alignmentAdapter.ts`
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\alignmentAdapter.test.ts`
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\server.ts`
**Step 1: Write the failing test**
Test that the adapter maps raw alignment responses into normalized internal types.
```ts
it('maps aligned words and speaker segments from the adapter response', async () => {
const result = await parseAlignmentResponse({
words: [{ word: 'hello', start: 1.0, end: 1.2, speaker: 'spk_0', score: 0.95 }],
speakers: [{ speaker: 'spk_0', start: 0.8, end: 1.6 }],
});
expect(result.words[0].speakerId).toBe('spk_0');
});
```
**Step 2: Run test to verify it fails**
Run: `npm test -- --run src/server/alignmentAdapter.test.ts`
Expected: FAIL because no adapter exists.
**Step 3: Write minimal implementation**
Create an adapter boundary with one public function such as `requestAlignedTranscript(audioPath)`.
```ts
export const requestAlignedTranscript = async (audioPath: string) => {
// call local or remote alignment backend
// normalize response shape
};
```
**Step 4: Run test to verify it passes**
Run: `npm test -- --run src/server/alignmentAdapter.test.ts`
Expected: PASS.
**Step 5: Commit**
```bash
git add src/server/alignmentAdapter.ts src/server/alignmentAdapter.test.ts server.ts
git commit -m "feat: add alignment service adapter"
```
### Task 7: Upgrade `/api/process-audio-pipeline` Response Shape
**Files:**
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\server.ts`
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\services\geminiService.ts`
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\services\geminiService.test.ts`
**Step 1: Write the failing test**
Add a client-side test for parsing `quality`, `speakers`, and `words`.
```ts
it('maps the enriched audio pipeline response into subtitle objects', async () => {
const payload = {
subtitles: [
{
id: 'sub_1',
startTime: 1,
endTime: 2,
originalText: 'Hello',
translatedText: '你好',
speaker: 'Speaker 1',
speakerId: 'spk_0',
words: [{ text: 'Hello', startTime: 1, endTime: 2, speakerId: 'spk_0', confidence: 0.9 }],
confidence: 0.9,
},
],
speakers: [{ speakerId: 'spk_0', label: 'Speaker 1' }],
quality: 'full',
};
expect(mapPipelineResponse(payload).subtitles[0].words).toHaveLength(1);
});
```
**Step 2: Run test to verify it fails**
Run: `npm test -- --run src/services/geminiService.test.ts`
Expected: FAIL because the mapping helper does not exist.
**Step 3: Write minimal implementation**
1. Add a response-mapping helper in `src/services/geminiService.ts`.
2. Preserve the existing fallback path.
3. Carry `quality` metadata to the UI.
```ts
const quality = data.quality ?? 'fallback';
const subtitles = (data.subtitles ?? []).map(mapSubtitleFromApi);
```
**Step 4: Run test to verify it passes**
Run: `npm test -- --run src/services/geminiService.test.ts`
Expected: PASS.
**Step 5: Commit**
```bash
git add server.ts src/services/geminiService.ts src/services/geminiService.test.ts
git commit -m "feat: return enriched subtitle pipeline payloads"
```
### Task 8: Add Precision Metadata to Editor State
**Files:**
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\components\EditorScreen.tsx`
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\components\EditorScreen.test.tsx`
**Step 1: Write the failing test**
Add a test for rendering a fallback warning when `quality` is low.
```tsx
it('shows a low-precision notice for fallback subtitle results', () => {
render(
Low-precision timing detected. Manual review recommended.
)} ``` **Step 4: Run test to verify it passes** Run: `npm test -- --run src/components/EditorScreen.test.tsx` Expected: PASS. **Step 5: Commit** ```bash git add src/components/EditorScreen.tsx src/components/EditorScreen.test.tsx git commit -m "feat: surface subtitle precision status in editor" ``` ### Task 9: Add Word-Level Playback Helpers **Files:** - Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\playback\wordHighlight.ts` - Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\playback\wordHighlight.test.ts` - Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\components\EditorScreen.tsx` **Step 1: Write the failing test** Test the active-word lookup helper. ```ts it('returns the active word for the current playback time', () => { const activeWord = getActiveWord([ { text: 'Hello', startTime: 1, endTime: 1.3, speakerId: 'spk_0', confidence: 0.9 }, ], 1.1); expect(activeWord?.text).toBe('Hello'); }); ``` **Step 2: Run test to verify it fails** Run: `npm test -- --run src/lib/playback/wordHighlight.test.ts` Expected: FAIL because playback helpers do not exist. **Step 3: Write minimal implementation** 1. Create a pure helper for active-word lookup. 2. Use it in `EditorScreen.tsx` to render highlighted word spans when `words` are present. ```ts export const getActiveWord = (words: WordTiming[], currentTime: number) => words.find((word) => currentTime >= word.startTime && currentTime <= word.endTime); ``` **Step 4: Run test to verify it passes** Run: `npm test -- --run src/lib/playback/wordHighlight.test.ts` Expected: PASS. **Step 5: Commit** ```bash git add src/lib/playback/wordHighlight.ts src/lib/playback/wordHighlight.test.ts src/components/EditorScreen.tsx git commit -m "feat: add word-level playback highlighting" ``` ### Task 10: Snap Timeline Edges to Word Boundaries **Files:** - Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\timeline\snapToWords.ts` - Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\timeline\snapToWords.test.ts` - Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\components\EditorScreen.tsx` **Step 1: Write the failing test** Test snapping to nearest word edges. ```ts it('snaps a dragged start edge to the nearest word boundary', () => { const next = snapTimeToNearestWordBoundary( 1.34, [ { text: 'Hello', startTime: 1.0, endTime: 1.3, speakerId: 'spk_0', confidence: 0.9 }, { text: 'world', startTime: 1.35, endTime: 1.8, speakerId: 'spk_0', confidence: 0.9 }, ], ); expect(next).toBe(1.35); }); ``` **Step 2: Run test to verify it fails** Run: `npm test -- --run src/lib/timeline/snapToWords.test.ts` Expected: FAIL because no snapping helper exists. **Step 3: Write minimal implementation** 1. Add a pure snapping helper with a small tolerance window. 2. Use it in the left and right resize timeline handlers. ```ts export const snapTimeToNearestWordBoundary = (time: number, words: WordTiming[]) => { // choose nearest start or end boundary within tolerance }; ``` **Step 4: Run test to verify it passes** Run: `npm test -- --run src/lib/timeline/snapToWords.test.ts` Expected: PASS. **Step 5: Commit** ```bash git add src/lib/timeline/snapToWords.ts src/lib/timeline/snapToWords.test.ts src/components/EditorScreen.tsx git commit -m "feat: snap subtitle edits to word boundaries" ``` ### Task 11: Add Speaker-Aware UI State **Files:** - Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\components\EditorScreen.tsx` - Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\voices.ts` - Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\speakers\speakerPresentation.ts` - Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\speakers\speakerPresentation.test.ts` **Step 1: Write the failing test** Test stable color and label generation for speaker tracks. ```ts it('creates stable display metadata for each speaker id', () => { const speaker = buildSpeakerPresentation({ speakerId: 'spk_0', label: 'Speaker 1' }); expect(speaker.color).toMatch(/^#/); }); ``` **Step 2: Run test to verify it fails** Run: `npm test -- --run src/lib/speakers/speakerPresentation.test.ts` Expected: FAIL because no speaker presentation helper exists. **Step 3: Write minimal implementation** 1. Create a helper that derives display color and fallback label from `speakerId`. 2. Use it to color sentence chips or timeline items. 3. Keep voice assignment behavior backward compatible. ```ts export const buildSpeakerPresentation = ({ speakerId, label }: SpeakerTrack) => ({ speakerId, label, color: '#1677ff', }); ``` **Step 4: Run test to verify it passes** Run: `npm test -- --run src/lib/speakers/speakerPresentation.test.ts` Expected: PASS. **Step 5: Commit** ```bash git add src/components/EditorScreen.tsx src/voices.ts src/lib/speakers/speakerPresentation.ts src/lib/speakers/speakerPresentation.test.ts git commit -m "feat: add speaker-aware editor presentation" ``` ### Task 12: Verify End-to-End Behavior and Update Docs **Files:** - Modify: `E:\Downloads\ai-video-dubbing-&-translation\README.md` - Modify: `E:\Downloads\ai-video-dubbing-&-translation\docs\plans\2026-03-17-precise-dialogue-localization-design.md` **Step 1: Write the failing test** Write down the manual verification checklist before changing docs so the release criteria are explicit. ```md - [ ] Single-speaker clip returns `quality: full` - [ ] Two-speaker clip shows distinct speaker IDs - [ ] Fallback path shows low-precision notice - [ ] Timeline resize snaps to word boundaries ``` **Step 2: Run test to verify it fails** Run: `npm run lint` Expected: PASS or FAIL depending on in-progress code, but manual verification is still incomplete until the checklist is executed. **Step 3: Write minimal implementation** 1. Update `README.md` with new environment requirements and pipeline description. 2. Record the manual verification results in the design document or a linked note. ```md ## High-Precision Subtitle Mode Set the alignment backend environment variables before running the app. ``` **Step 4: Run test to verify it passes** Run: `npm test -- --run` Expected: PASS. Run: `npm run lint` Expected: PASS. Run: `npm run build` Expected: PASS. **Step 5: Commit** ```bash git add README.md docs/plans/2026-03-17-precise-dialogue-localization-design.md git commit -m "docs: document precise dialogue localization workflow" ``` ## Notes for Execution 1. This workspace currently has no `.git` directory, so commit steps cannot be executed until the project is placed in a real Git checkout. 2. Introduce the alignment backend behind environment-based configuration so existing demos can still use the current fallback path. 3. Prefer pure functions for sentence reconstruction, speaker assignment, snapping, and word-highlighting logic so they remain easy to test.