+
+# Run and deploy your AI Studio app
+
+This contains everything you need to run your app locally.
+
+View your app in AI Studio: https://ai.studio/apps/a38a3cd5-7f82-49f0-a26e-99be4d77f863
+
+## Run Locally
+
+**Prerequisites:** Node.js
+
+
+1. Install dependencies:
+ `npm install`
+2. Configure [.env](.env) with:
+ `ARK_API_KEY`
+ `GEMINI_API_KEY`
+ `MINIMAX_API_KEY`
+3. Optional defaults:
+ `DEFAULT_LLM_PROVIDER=doubao`
+ `DOUBAO_MODEL=doubao-seed-2-0-pro-260215`
+4. Run the app:
+ `npm run dev`
+
+## Model Switching
+
+1. Subtitle generation now runs through the server and supports `Doubao` and `Gemini`.
+2. The editor shows an `LLM` selector and defaults to `Doubao`.
+3. `TTS` stays fixed on `MiniMax` regardless of the selected LLM.
+4. All provider keys are read from `.env`; the browser no longer calls LLM providers directly.
+
+## Subtitle Generation
+
+1. Subtitle generation is now driven by server-side multimodal LLM calls on the uploaded video file.
+2. No separate local alignment/ASR backend is required for `/api/generate-subtitles`.
diff --git a/docs/plans/2026-03-17-doubao-llm-provider-design.md b/docs/plans/2026-03-17-doubao-llm-provider-design.md
new file mode 100644
index 0000000..c12ebb2
--- /dev/null
+++ b/docs/plans/2026-03-17-doubao-llm-provider-design.md
@@ -0,0 +1,249 @@
+# Doubao LLM Provider Design
+
+**Date:** 2026-03-17
+
+**Goal:** Add a user-visible LLM switcher so subtitle generation can use either Doubao or Gemini, default to Doubao, and keep TTS fixed on MiniMax.
+
+## Current State
+
+The current project is effectively Gemini-only for subtitle generation and translation.
+
+1. `src/services/geminiService.ts` calls Gemini directly from the browser for subtitle generation and Gemini fallback TTS.
+2. `src/server/geminiTranslation.ts` translates sentence text on the server with Gemini.
+3. `src/server/audioPipelineConfig.ts` only validates `GEMINI_API_KEY`.
+4. `src/components/EditorScreen.tsx` imports a Gemini-specific service and has no model selector.
+5. MiniMax is already independent and used only for TTS through `/api/tts`.
+
+This makes provider switching hard because the LLM choice is not isolated behind a shared contract.
+
+## Product Requirements
+
+1. The editor must show a visible LLM selector.
+2. Available LLM options are `Doubao` and `Gemini`.
+3. The default LLM must be `Doubao`.
+4. TTS must remain fixed to MiniMax and must not participate in provider switching.
+5. API keys must only come from `.env`.
+6. The app must not silently fall back from one LLM provider to the other.
+
+## Chosen Approach
+
+Use a server-side provider abstraction for subtitle generation and translation, with a frontend selector that passes the chosen provider to the server.
+
+This approach keeps secrets on the server, avoids browser-side provider drift, and gives the project one place to add or change LLM providers later.
+
+## Why This Approach
+
+### Option A: Server-side provider abstraction with frontend selector
+
+Recommended.
+
+1. Frontend sends `provider: 'doubao' | 'gemini'`.
+2. Server reads the matching API key from `.env`.
+3. Server routes subtitle text generation through a provider adapter.
+4. Time-critical audio extraction and timeline logic stay outside the provider-specific layer.
+
+Pros:
+
+1. Keeps API keys off the client.
+2. Produces one consistent API contract for the editor.
+3. Makes default-provider behavior easy to enforce.
+4. Prevents Gemini-specific code from leaking further into the app.
+
+Cons:
+
+1. Requires moving browser-side subtitle generation behavior into a server-owned path.
+2. Touches both frontend and backend.
+
+### Option B: Keep Gemini in the browser and add Doubao as a separate server path
+
+Rejected.
+
+Pros:
+
+1. Faster initial implementation.
+
+Cons:
+
+1. Two subtitle-generation architectures would coexist.
+2. Provider behavior would drift over time.
+3. It violates the requirement that keys come only from `.env`.
+
+### Option C: Client-side provider switching
+
+Rejected.
+
+Pros:
+
+1. Minimal backend work.
+
+Cons:
+
+1. Exposes secrets to the browser.
+2. Conflicts with the `.env`-only requirement.
+
+## Architecture
+
+### Frontend
+
+The editor adds an `LLM` selector with the values:
+
+1. `Doubao`
+2. `Gemini`
+
+The default selected value is `Doubao`.
+
+When the user clicks subtitle generation, the frontend sends:
+
+1. the uploaded video
+2. the target language
+3. the selected LLM provider
+4. optional trim metadata if the current flow needs it
+
+The frontend no longer needs to know how Gemini or Doubao are called. It only consumes a normalized subtitle payload.
+
+### Server
+
+The server becomes the single owner of LLM subtitle generation.
+
+Responsibilities:
+
+1. validate the incoming provider
+2. read provider credentials from `.env`
+3. extract audio and prepare subtitle-generation inputs
+4. call the chosen provider adapter
+5. normalize the result into the existing subtitle shape
+
+### Provider Layer
+
+Create a provider abstraction around LLM calls:
+
+1. `resolveLlmProvider(provider, env)`
+2. `geminiProvider`
+3. `doubaoProvider`
+
+Each provider must accept the same logical input and return the same logical output so the rest of the app is provider-agnostic.
+
+## API Design
+
+Add a dedicated subtitle-generation endpoint rather than overloading the existing audio-extraction endpoint.
+
+### Request
+
+`POST /api/generate-subtitles`
+
+Multipart or JSON payload fields:
+
+1. `video`
+2. `targetLanguage`
+3. `provider`
+4. optional `trimRange`
+
+### Response
+
+Return the same normalized subtitle structure the editor already understands.
+
+At minimum each subtitle object should include:
+
+1. `id`
+2. `startTime`
+3. `endTime`
+4. `originalText`
+5. `translatedText`
+6. `speaker`
+7. `voiceId`
+8. `volume`
+
+If richer timeline metadata already exists in the current server subtitle pipeline, keep it in the response rather than trimming it away.
+
+## Subtitle Generation Strategy
+
+The provider switch should affect LLM reasoning, not TTS and not the MiniMax path.
+
+The cleanest boundary is:
+
+1. audio extraction and timeline preparation stay on the server
+2. LLM provider handles translation and label generation
+3. MiniMax remains the only TTS engine
+
+This reduces the risk that switching providers changes subtitle timing behavior unpredictably.
+
+## Doubao Integration Notes
+
+Use the Ark Responses API on the server:
+
+1. host: `https://ark.cn-beijing.volces.com/api/v3/responses`
+2. auth: `Authorization: Bearer ${ARK_API_KEY}`
+3. model: configurable, defaulting to `doubao-seed-2-0-pro-260215`
+
+The provider should treat Doubao as a text-generation backend and extract normalized text from the response payload before JSON parsing.
+
+Implementation detail:
+
+1. the response parser should not assume SDK-specific helpers
+2. it should read the returned response envelope and collect the textual output fragments
+3. the final result should be parsed as JSON only after the output text is reconstructed
+
+This is an implementation inference based on the official Ark Responses API response shape and is meant to keep the parser resilient to wrapper differences.
+
+## Configuration
+
+Environment variables:
+
+1. `ARK_API_KEY` for Doubao
+2. `GEMINI_API_KEY` for Gemini
+3. `MINIMAX_API_KEY` for TTS
+4. optional `DOUBAO_MODEL` for server-side model override
+5. optional `DEFAULT_LLM_PROVIDER` with a default value of `doubao`
+
+Rules:
+
+1. No API keys may be embedded in frontend code.
+2. No provider may silently reuse another provider's key.
+3. If the selected provider is missing its key, return a clear error.
+
+## Error Handling
+
+Provider failures must be explicit.
+
+1. If `provider` is invalid, return `400`.
+2. If the selected provider key is missing, return `400`.
+3. If the selected provider returns an auth failure, return `401` or a mapped upstream auth error.
+4. If the selected provider fails unexpectedly, return `502` or `500` with a provider-specific error message.
+5. Do not auto-fallback from Doubao to Gemini or from Gemini to Doubao.
+
+The UI should show which provider failed so the user is never misled about which model generated a subtitle result.
+
+## Frontend UX
+
+Add the selector in the editor near the subtitle-generation controls so the choice is visible at generation time.
+
+Rules:
+
+1. Default selection is `Doubao`.
+2. The selector affects each generation request immediately.
+3. The selector does not affect previously generated subtitles until the user regenerates.
+4. The selector does not affect MiniMax TTS generation.
+
+## Testing Strategy
+
+Coverage should focus on deterministic seams.
+
+1. Provider resolution defaults to Doubao.
+2. Invalid provider is rejected.
+3. Missing `ARK_API_KEY` or `GEMINI_API_KEY` returns clear errors.
+4. Doubao response parsing turns Ark response content into normalized subtitle JSON.
+5. Gemini and Doubao providers both satisfy the same interface contract.
+6. Editor defaults to Doubao and sends the selected provider on regenerate.
+7. TTS behavior remains unchanged when the LLM provider changes.
+
+## Rollout Notes
+
+1. Introduce the new endpoint and provider abstraction first.
+2. Switch the editor to the new endpoint second.
+3. Keep MiniMax TTS untouched except for regression checks.
+4. Leave any deeper visual fallback provider work for a later pass if needed.
+
+## Constraints
+
+1. This workspace is not a Git repository, so the design document cannot be committed here.
+2. The user provided an Ark key in chat, but the implementation must still read provider secrets from `.env` and not hardcode them into source files.
diff --git a/docs/plans/2026-03-17-doubao-llm-provider.md b/docs/plans/2026-03-17-doubao-llm-provider.md
new file mode 100644
index 0000000..22cdc07
--- /dev/null
+++ b/docs/plans/2026-03-17-doubao-llm-provider.md
@@ -0,0 +1,472 @@
+# Doubao LLM Provider Implementation Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
+
+**Goal:** Add a user-visible LLM switcher that lets subtitle generation use Doubao or Gemini, defaults to Doubao, and keeps TTS fixed on MiniMax with all provider keys sourced from `.env`.
+
+**Architecture:** Move subtitle generation behind a new server endpoint, introduce a provider abstraction for Gemini and Doubao, and update the editor to send the selected provider while continuing to use the existing subtitle shape. Keep MiniMax TTS separate and untouched except for regression coverage.
+
+**Tech Stack:** React, TypeScript, Express, multer, fetch, Vitest
+
+---
+
+### Task 1: Add provider types and configuration resolution
+
+**Files:**
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\llmProvider.ts`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\llmProvider.test.ts`
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\audioPipelineConfig.ts`
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\audioPipelineConfig.test.ts`
+
+**Step 1: Write the failing test**
+
+```ts
+import { describe, expect, it } from 'vitest';
+import { normalizeLlmProvider, resolveLlmProviderConfig } from './llmProvider';
+
+describe('llmProvider config', () => {
+ it('defaults to doubao when no provider override is set', () => {
+ expect(normalizeLlmProvider(undefined)).toBe('doubao');
+ });
+
+ it('returns the selected provider key from env', () => {
+ expect(
+ resolveLlmProviderConfig('doubao', {
+ ARK_API_KEY: 'ark-key',
+ GEMINI_API_KEY: 'gemini-key',
+ }),
+ ).toEqual(expect.objectContaining({ provider: 'doubao', apiKey: 'ark-key' }));
+ });
+});
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/server/llmProvider.test.ts src/server/audioPipelineConfig.test.ts`
+Expected: FAIL because `llmProvider.ts` does not exist and `audioPipelineConfig.ts` still only exposes Gemini config.
+
+**Step 3: Write minimal implementation**
+
+```ts
+export type LlmProvider = 'doubao' | 'gemini';
+
+export const normalizeLlmProvider = (value?: string): LlmProvider =>
+ value?.toLowerCase() === 'gemini' ? 'gemini' : 'doubao';
+
+export const resolveLlmProviderConfig = (
+ provider: LlmProvider,
+ env: NodeJS.ProcessEnv,
+) => {
+ if (provider === 'doubao') {
+ const apiKey = env.ARK_API_KEY?.trim();
+ if (!apiKey) throw new Error('ARK_API_KEY is required for Doubao subtitle generation.');
+ return {
+ provider,
+ apiKey,
+ model: env.DOUBAO_MODEL?.trim() || 'doubao-seed-2-0-pro-260215',
+ baseUrl: 'https://ark.cn-beijing.volces.com/api/v3/responses',
+ };
+ }
+
+ const apiKey = env.GEMINI_API_KEY?.trim();
+ if (!apiKey) throw new Error('GEMINI_API_KEY is required for Gemini subtitle generation.');
+ return {
+ provider,
+ apiKey,
+ model: 'gemini-2.5-flash',
+ };
+};
+```
+
+**Step 4: Run test to verify it passes**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/server/llmProvider.test.ts src/server/audioPipelineConfig.test.ts`
+Expected: PASS
+
+**Step 5: Commit**
+
+```bash
+git add src/server/llmProvider.ts src/server/llmProvider.test.ts src/server/audioPipelineConfig.ts src/server/audioPipelineConfig.test.ts
+git commit -m "feat: add llm provider configuration"
+```
+
+### Task 2: Add the Doubao provider parser and contract tests
+
+**Files:**
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\doubaoProvider.ts`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\doubaoProvider.test.ts`
+
+**Step 1: Write the failing test**
+
+```ts
+import { describe, expect, it } from 'vitest';
+import { extractDoubaoTextOutput } from './doubaoProvider';
+
+describe('extractDoubaoTextOutput', () => {
+ it('reconstructs text from the Ark output array', () => {
+ const text = extractDoubaoTextOutput({
+ output: [
+ {
+ type: 'message',
+ content: [{ type: 'output_text', text: '[{"id":"1","translatedText":"你好"}]' }],
+ },
+ ],
+ });
+
+ expect(text).toContain('translatedText');
+ });
+});
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/server/doubaoProvider.test.ts`
+Expected: FAIL because `doubaoProvider.ts` does not exist.
+
+**Step 3: Write minimal implementation**
+
+```ts
+export const extractDoubaoTextOutput = (payload: any): string =>
+ (payload?.output ?? [])
+ .flatMap((item: any) => item?.content ?? [])
+ .filter((part: any) => part?.type === 'output_text')
+ .map((part: any) => part.text ?? '')
+ .join('')
+ .trim();
+```
+
+**Step 4: Run test to verify it passes**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/server/doubaoProvider.test.ts`
+Expected: PASS
+
+**Step 5: Commit**
+
+```bash
+git add src/server/doubaoProvider.ts src/server/doubaoProvider.test.ts
+git commit -m "feat: add doubao response parsing"
+```
+
+### Task 3: Add provider-backed translation adapters
+
+**Files:**
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\geminiTranslation.ts`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\providerTranslation.ts`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\providerTranslation.test.ts`
+- Test: `E:\Downloads\ai-video-dubbing-&-translation\src\server\geminiTranslation.test.ts`
+
+**Step 1: Write the failing test**
+
+```ts
+import { describe, expect, it } from 'vitest';
+import { createSentenceTranslator } from './providerTranslation';
+
+describe('createSentenceTranslator', () => {
+ it('returns a Doubao translator when provider is doubao', () => {
+ const translator = createSentenceTranslator({
+ provider: 'doubao',
+ apiKey: 'ark-key',
+ model: 'doubao-seed-2-0-pro-260215',
+ });
+
+ expect(typeof translator).toBe('function');
+ });
+});
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/server/providerTranslation.test.ts src/server/geminiTranslation.test.ts`
+Expected: FAIL because the provider selection layer does not exist.
+
+**Step 3: Write minimal implementation**
+
+```ts
+export const createSentenceTranslator = (config: ProviderConfig) => {
+ if (config.provider === 'doubao') {
+ return createDoubaoSentenceTranslator(config);
+ }
+ return createGeminiSentenceTranslator(config);
+};
+```
+
+**Step 4: Run test to verify it passes**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/server/providerTranslation.test.ts src/server/geminiTranslation.test.ts`
+Expected: PASS
+
+**Step 5: Commit**
+
+```bash
+git add src/server/providerTranslation.ts src/server/providerTranslation.test.ts src/server/geminiTranslation.ts src/server/geminiTranslation.test.ts
+git commit -m "feat: add provider-based translation adapters"
+```
+
+### Task 4: Add a dedicated subtitle-generation endpoint
+
+**Files:**
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\server.ts`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleRequest.ts`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleRequest.test.ts`
+- Test: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitlePipeline.test.ts`
+
+**Step 1: Write the failing test**
+
+```ts
+import { describe, expect, it } from 'vitest';
+import { parseSubtitleRequest } from './subtitleRequest';
+
+describe('parseSubtitleRequest', () => {
+ it('defaults provider to doubao', () => {
+ expect(parseSubtitleRequest({ body: {} as any }).provider).toBe('doubao');
+ });
+});
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/server/subtitleRequest.test.ts src/server/subtitlePipeline.test.ts`
+Expected: FAIL because the request parser does not exist.
+
+**Step 3: Write minimal implementation**
+
+```ts
+export const parseSubtitleRequest = (req: { body: Record }) => ({
+ provider: normalizeLlmProvider(String(req.body.provider || 'doubao')),
+ targetLanguage: String(req.body.targetLanguage || ''),
+});
+```
+
+Then update `server.ts` to expose `POST /api/generate-subtitles`, validate input, resolve provider config, and return normalized subtitles.
+
+**Step 4: Run test to verify it passes**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/server/subtitleRequest.test.ts src/server/subtitlePipeline.test.ts`
+Expected: PASS
+
+**Step 5: Commit**
+
+```bash
+git add server.ts src/server/subtitleRequest.ts src/server/subtitleRequest.test.ts src/server/subtitlePipeline.test.ts
+git commit -m "feat: add subtitle generation endpoint"
+```
+
+### Task 5: Update the frontend subtitle service to use the new endpoint
+
+**Files:**
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\services\geminiService.ts`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\services\subtitleService.ts`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\services\subtitleService.test.ts`
+- Test: `E:\Downloads\ai-video-dubbing-&-translation\src\components\EditorScreen.test.tsx`
+
+**Step 1: Write the failing test**
+
+```ts
+import { describe, expect, it, vi } from 'vitest';
+import { generateSubtitles } from './subtitleService';
+
+describe('generateSubtitles', () => {
+ it('posts the selected provider to the server', async () => {
+ const fetchMock = vi.fn(async () => ({
+ ok: true,
+ json: async () => ({ subtitles: [] }),
+ }));
+
+ await generateSubtitles(new File(['x'], 'clip.mp4'), 'English', 'doubao', null, fetchMock as any);
+
+ expect(fetchMock.mock.calls[0][0]).toBe('/api/generate-subtitles');
+ });
+});
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx`
+Expected: FAIL because the new service does not exist and the editor still uses the Gemini-specific service directly.
+
+**Step 3: Write minimal implementation**
+
+```ts
+export const generateSubtitles = async (
+ videoFile: File,
+ targetLanguage: string,
+ provider: 'doubao' | 'gemini',
+ trimRange?: { start: number; end: number } | null,
+ fetchImpl: typeof fetch = fetch,
+) => {
+ const formData = new FormData();
+ formData.append('video', videoFile);
+ formData.append('targetLanguage', targetLanguage);
+ formData.append('provider', provider);
+ if (trimRange) {
+ formData.append('trimRange', JSON.stringify(trimRange));
+ }
+
+ const response = await fetchImpl('/api/generate-subtitles', {
+ method: 'POST',
+ body: formData,
+ });
+
+ return response.json();
+};
+```
+
+**Step 4: Run test to verify it passes**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx`
+Expected: PASS
+
+**Step 5: Commit**
+
+```bash
+git add src/services/subtitleService.ts src/services/subtitleService.test.ts src/services/geminiService.ts src/components/EditorScreen.test.tsx
+git commit -m "feat: route subtitle generation through the server"
+```
+
+### Task 6: Add the editor LLM selector and default it to Doubao
+
+**Files:**
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\components\EditorScreen.tsx`
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\components\EditorScreen.test.tsx`
+
+**Step 1: Write the failing test**
+
+```tsx
+it('defaults the llm selector to Doubao', () => {
+ render( {}} />);
+ expect(screen.getByLabelText(/llm/i)).toHaveValue('doubao');
+});
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/components/EditorScreen.test.tsx`
+Expected: FAIL because the selector does not exist.
+
+**Step 3: Write minimal implementation**
+
+```tsx
+const [llmProvider, setLlmProvider] = useState<'doubao' | 'gemini'>('doubao');
+
+
+```
+
+Then pass `llmProvider` into the subtitle-generation service.
+
+**Step 4: Run test to verify it passes**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/components/EditorScreen.test.tsx`
+Expected: PASS
+
+**Step 5: Commit**
+
+```bash
+git add src/components/EditorScreen.tsx src/components/EditorScreen.test.tsx
+git commit -m "feat: add llm selector to the editor"
+```
+
+### Task 7: Add end-to-end provider and regression coverage
+
+**Files:**
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitlePipeline.test.ts`
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\services\geminiService.test.ts`
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\minimaxTts.test.ts`
+
+**Step 1: Write the failing test**
+
+```ts
+it('does not change TTS behavior when the llm provider changes', async () => {
+ expect(true).toBe(true);
+});
+```
+
+**Step 2: Run test to verify it fails meaningfully**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/server/subtitlePipeline.test.ts src/services/geminiService.test.ts src/server/minimaxTts.test.ts`
+Expected: FAIL or require stronger assertions until the new provider path is covered.
+
+**Step 3: Write minimal implementation**
+
+Add regression tests that prove:
+
+1. selected provider is forwarded correctly
+2. Doubao auth failures surface clearly
+3. Gemini still works when selected
+4. MiniMax TTS tests continue to pass unchanged
+
+**Step 4: Run test to verify it passes**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/server/llmProvider.test.ts src/server/doubaoProvider.test.ts src/server/providerTranslation.test.ts src/server/subtitleRequest.test.ts src/server/subtitlePipeline.test.ts src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx src/server/minimaxTts.test.ts src/services/geminiService.test.ts`
+Expected: PASS
+
+**Step 5: Commit**
+
+```bash
+git add src/server/llmProvider.test.ts src/server/doubaoProvider.test.ts src/server/providerTranslation.test.ts src/server/subtitleRequest.test.ts src/server/subtitlePipeline.test.ts src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx src/server/minimaxTts.test.ts src/services/geminiService.test.ts
+git commit -m "test: cover llm provider switching"
+```
+
+### Task 8: Verify the live app behavior
+
+**Files:**
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\.env.example`
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\README.md`
+
+**Step 1: Write the failing doc check**
+
+Add docs assertions by inspection:
+
+1. `.env.example` documents `ARK_API_KEY` and optional `DOUBAO_MODEL`
+2. README explains the editor LLM switcher and that MiniMax remains the TTS engine
+
+**Step 2: Run verification commands**
+
+Run: `node .\node_modules\vitest\vitest.mjs run`
+Expected: PASS for the new targeted suites or clear identification of pre-existing unrelated failures.
+
+Run: `Invoke-WebRequest -UseBasicParsing http://localhost:3000/`
+Expected: `200`
+
+Run manual checks:
+
+1. open the editor
+2. confirm the `LLM` selector defaults to `Doubao`
+3. generate subtitles with `Doubao`
+4. switch to `Gemini`
+5. generate subtitles again
+6. confirm TTS still uses MiniMax
+
+**Step 3: Write minimal documentation updates**
+
+Document:
+
+1. required env keys
+2. default provider
+3. how the editor switcher works
+
+**Step 4: Re-run verification**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/server/llmProvider.test.ts src/server/doubaoProvider.test.ts src/server/providerTranslation.test.ts src/server/subtitleRequest.test.ts src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx src/server/minimaxTts.test.ts`
+Expected: PASS
+
+**Step 5: Commit**
+
+```bash
+git add .env.example README.md
+git commit -m "docs: document llm provider switching"
+```
+
+## Notes
+
+1. This workspace is not a Git repository, so the commit steps may not be executable here.
+2. Existing unrelated TypeScript baseline issues in `src/lib/*` and `src/server/*` should be treated as pre-existing unless the new work touches them directly.
diff --git a/docs/plans/2026-03-17-export-preview-parity-design.md b/docs/plans/2026-03-17-export-preview-parity-design.md
new file mode 100644
index 0000000..b1cdcbc
--- /dev/null
+++ b/docs/plans/2026-03-17-export-preview-parity-design.md
@@ -0,0 +1,76 @@
+# Export Preview Parity Design
+
+**Date:** 2026-03-17
+
+**Goal:** Make exported videos match the editor preview for audio mixing, subtitle timing, and visible subtitle styling.
+
+## Current State
+
+The editor preview and the export pipeline currently render the same edit session through different implementations:
+
+1. The preview in `src/components/EditorScreen.tsx` overlays subtitle text with React and plays audio using the browser media elements plus per-subtitle `Audio` instances.
+2. The export in `server.ts` rebuilds subtitles as SRT, mixes audio with FFmpeg, and trims the final output after subtitle timing and TTS delays have already been computed.
+
+This creates three deterministic mismatches:
+
+1. Export mixes original audio even when the preview has muted it because instrumental BGM is present.
+2. Export uses relative subtitle times from the trimmed editor session but trims the final video afterward, shifting or cutting subtitle/TTS timing.
+3. Export ignores `textStyles`, so the rendered subtitle look differs from the preview.
+
+## Chosen Approach
+
+Adopt preview-first export semantics:
+
+1. Treat the editor state as the source of truth.
+2. Serialize the preview-visible subtitle data, text styles, and audio volume data explicitly.
+3. Convert preview-relative subtitle timing into export timeline timing before FFmpeg rendering.
+4. Generate styled subtitle overlays in the backend instead of relying on FFmpeg defaults.
+
+## Architecture
+
+### Frontend
+
+The editor passes a richer export payload:
+
+1. Subtitle text
+2. Subtitle timing
+3. Subtitle audio volume
+4. Global text style settings
+5. Trim range
+6. Instrumental BGM base64 when present
+
+The preview itself stays unchanged and remains the reference behavior.
+
+### Backend Export Layer
+
+The export route should move the parity-sensitive logic into pure helpers:
+
+1. Build an export subtitle timeline that shifts relative editor timings back onto the full-video timeline when trimming is enabled.
+2. Build an audio mix plan that mirrors preview rules:
+ - Use instrumental BGM at preview volume when present.
+ - Exclude original source audio when instrumental BGM is present.
+ - Otherwise keep original source audio at preview volume.
+ - Apply each subtitle TTS clip at its configured volume.
+3. Generate ASS subtitle content so font, color, alignment, bold, italic, and underline can be rendered intentionally.
+
+## Data Flow
+
+1. `EditorScreen` passes `textStyles` into `ExportModal`.
+2. `ExportModal` builds a structured export payload instead of manually shaping subtitle fields inline.
+3. `server.ts` parses `textStyles`, normalizes subtitle timing for export, builds ASS subtitle content, and applies the preview-equivalent audio plan.
+4. FFmpeg burns styled subtitles and mixes the planned audio sources.
+
+## Testing Strategy
+
+Add regression coverage around pure helpers instead of FFmpeg end-to-end tests:
+
+1. Frontend payload builder includes style and volume fields.
+2. Export timeline normalization shifts subtitle timing correctly for trimmed clips.
+3. Audio mix planning excludes original audio when BGM is present and keeps it at preview volume when BGM is absent.
+4. ASS subtitle generation reflects the selected style settings.
+
+## Risks
+
+1. ASS subtitle rendering may still not be pixel-perfect relative to browser CSS.
+2. Existing exports without style payload should remain backward compatible by falling back to safe defaults.
+3. FFmpeg filter graph assembly becomes slightly more complex, so helper-level tests are required before touching route logic.
diff --git a/docs/plans/2026-03-17-export-preview-parity.md b/docs/plans/2026-03-17-export-preview-parity.md
new file mode 100644
index 0000000..ff6dd32
--- /dev/null
+++ b/docs/plans/2026-03-17-export-preview-parity.md
@@ -0,0 +1,127 @@
+# Export Preview Parity Implementation Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
+
+**Goal:** Make exported videos match the editor preview for audio behavior, subtitle timing, and visible subtitle styling.
+
+**Architecture:** Keep the editor preview as the source of truth and teach the export pipeline to consume the same state explicitly. Extract pure helpers for export payload building, subtitle timeline normalization, audio mix planning, and ASS subtitle generation so we can lock parity with tests before wiring FFmpeg.
+
+**Tech Stack:** React 19, TypeScript, Express, FFmpeg, Vitest.
+
+---
+
+### Task 1: Add Export Payload Builder Coverage
+
+**Files:**
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\exportPayload.ts`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\exportPayload.test.ts`
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\components\ExportModal.tsx`
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\components\EditorScreen.tsx`
+
+**Step 1: Write the failing test**
+
+Cover that export payloads include subtitle audio volume and global text styles.
+
+**Step 2: Run test to verify it fails**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/lib/exportPayload.test.ts`
+Expected: FAIL because the helper does not exist yet.
+
+**Step 3: Write minimal implementation**
+
+Create a small pure builder and wire `ExportModal` to use it. Pass `textStyles` from `EditorScreen`.
+
+**Step 4: Run test to verify it passes**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/lib/exportPayload.test.ts`
+Expected: PASS.
+
+### Task 2: Add Export Backend Planning Helpers
+
+**Files:**
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\exportVideo.ts`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\exportVideo.test.ts`
+
+**Step 1: Write the failing test**
+
+Cover:
+
+1. Subtitle times shift by `trimRange.start` for export.
+2. Original source audio is excluded when BGM is present.
+3. Original source audio is kept at preview volume when BGM is absent.
+4. ASS subtitle output reflects selected styles.
+
+**Step 2: Run test to verify it fails**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/server/exportVideo.test.ts`
+Expected: FAIL because helper module does not exist yet.
+
+**Step 3: Write minimal implementation**
+
+Implement pure helpers for:
+
+1. Subtitle timeline normalization
+2. Audio mix planning
+3. ASS subtitle generation
+
+**Step 4: Run test to verify it passes**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/server/exportVideo.test.ts`
+Expected: PASS.
+
+### Task 3: Wire Backend Export Route to Helpers
+
+**Files:**
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\server.ts`
+
+**Step 1: Write the failing integration-leaning test**
+
+Extend `src/server/exportVideo.test.ts` if needed to assert the route-facing helper contract.
+
+**Step 2: Run test to verify it fails**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/server/exportVideo.test.ts`
+Expected: FAIL because current route behavior still assumes SRT/default mixing.
+
+**Step 3: Write minimal implementation**
+
+Update the route to:
+
+1. Parse `textStyles`
+2. Use normalized subtitle times for export
+3. Generate `.ass` instead of `.srt`
+4. Apply preview-equivalent audio mix rules
+
+**Step 4: Run test to verify it passes**
+
+Run: `node .\node_modules\vitest\vitest.mjs run src/server/exportVideo.test.ts`
+Expected: PASS.
+
+### Task 4: Verify End-to-End Regressions
+
+**Files:**
+- Modify if needed: `E:\Downloads\ai-video-dubbing-&-translation\src\services\geminiService.test.ts`
+- Modify if needed: `E:\Downloads\ai-video-dubbing-&-translation\src\server\minimaxTts.test.ts`
+
+**Step 1: Run focused regression suite**
+
+Run:
+
+```bash
+node .\node_modules\vitest\vitest.mjs run src/lib/exportPayload.test.ts src/server/exportVideo.test.ts src/services/geminiService.test.ts src/server/minimaxTts.test.ts
+```
+
+Expected: PASS.
+
+**Step 2: Run TypeScript check**
+
+Run: `node .\node_modules\typescript\bin\tsc --noEmit`
+Expected: Existing baseline errors may remain; no new export-parity errors should appear.
+
+**Step 3: Smoke-check the running app**
+
+1. Restart the local server.
+2. Export a trimmed clip with BGM and TTS.
+3. Confirm the exported audio and subtitle timing now match preview expectations.
+
+Plan complete and saved to `docs/plans/2026-03-17-export-preview-parity.md`. Defaulting to execution in this session using the plan directly.
diff --git a/docs/plans/2026-03-17-precise-dialogue-localization-design.md b/docs/plans/2026-03-17-precise-dialogue-localization-design.md
new file mode 100644
index 0000000..9013684
--- /dev/null
+++ b/docs/plans/2026-03-17-precise-dialogue-localization-design.md
@@ -0,0 +1,239 @@
+# Precise Dialogue Localization Design
+
+**Date:** 2026-03-17
+
+**Goal:** Upgrade the subtitle pipeline so sentence boundaries are more accurate, word-level timings are available, and speaker attribution is based on audio rather than LLM guesses.
+
+## Current State
+
+The current implementation has two subtitle generation paths:
+
+1. The primary path in `server.ts` extracts audio, calls Whisper with `timestamp_granularities: ['segment']`, then asks an LLM to translate and infer `speaker` and `gender`.
+2. The fallback path in `src/services/geminiService.ts` uses Gemini to infer subtitles from video or sampled frames.
+
+This is enough for rough subtitle generation, but it has three hard limits:
+
+1. Sentence timing is only segment-level, so start and end times drift at pause boundaries.
+2. Word-level timestamps do not exist, so precise editing and karaoke-style highlighting are impossible.
+3. Speaker identity is inferred from text, not measured from audio, so diarization quality is unreliable.
+
+## Chosen Approach
+
+Adopt a high-precision pipeline with a dedicated alignment layer:
+
+1. Extract clean mono audio from the uploaded video.
+2. Use voice activity detection (VAD) to isolate speech regions.
+3. Run ASR for rough transcription.
+4. Run forced alignment to refine every word boundary against the audio.
+5. Run speaker diarization to assign stable `speakerId` values.
+6. Rebuild editable subtitle sentences from aligned words.
+7. Translate only the sentence text while preserving timestamps and speaker assignments.
+
+The existing Node service remains the entry point, but it becomes an orchestration layer instead of doing all timing work itself.
+
+## Architecture
+
+### Frontend
+
+The React editor continues to call `/api/process-audio-pipeline`, but it now receives richer subtitle objects:
+
+1. Sentence-level timing for the timeline.
+2. Word-level timing for precise playback feedback.
+3. Stable `speakerId` values for speaker-aware UI and voice assignment.
+
+The current editor can remain backward compatible by continuing to render sentence-level fields first and gradually enabling word-level behavior.
+
+### Node Orchestration Layer
+
+`server.ts` keeps responsibility for:
+
+1. Receiving uploaded video data.
+2. Extracting audio with FFmpeg.
+3. Calling the alignment service.
+4. Translating sentence text.
+5. Returning a normalized payload to the frontend.
+
+The Node layer must not allow translation to rewrite timing or speaker assignments.
+
+### Alignment Layer
+
+This layer owns all timing-critical operations:
+
+1. VAD
+2. ASR
+3. Forced alignment
+4. Speaker diarization
+
+It can be implemented as a local Python service or a separately managed service as long as it returns deterministic machine-readable JSON.
+
+## Data Model
+
+The current `Subtitle` type should be extended rather than replaced.
+
+```ts
+type WordTiming = {
+ text: string;
+ startTime: number;
+ endTime: number;
+ speakerId: string;
+ confidence: number;
+};
+
+type Subtitle = {
+ id: string;
+ startTime: number;
+ endTime: number;
+ originalText: string;
+ translatedText: string;
+ speaker: string;
+ speakerId: string;
+ voiceId: string;
+ words: WordTiming[];
+ confidence: number;
+ audioUrl?: string;
+ volume?: number;
+};
+
+type SpeakerTrack = {
+ speakerId: string;
+ label: string;
+ gender?: 'male' | 'female' | 'unknown';
+};
+```
+
+Rules:
+
+1. `speakerId` is the stable machine identifier, for example `spk_0`.
+2. `speaker` is a user-facing label and can be renamed.
+3. Sentence `startTime` and `endTime` are derived from the first and last aligned words.
+
+## Processing Rules
+
+### Audio Preparation
+
+1. Convert uploaded video to `16kHz` mono WAV.
+2. Optionally create a denoised or vocal-enhanced copy when the source contains heavy music.
+
+### VAD
+
+Use VAD to identify speech windows and pad each detected region by about `0.2s`.
+
+### ASR and Forced Alignment
+
+1. Use ASR for text hypotheses and rough word order.
+2. Use forced alignment to compute accurate `startTime` and `endTime` for each word.
+3. Treat forced alignment as the source of truth for timing whenever available.
+
+### Diarization
+
+1. Run diarization separately and produce speaker segments.
+2. Assign each word to the speaker with the highest overlap.
+3. If a sentence crosses speakers, split it rather than forcing a mixed-speaker sentence.
+
+### Sentence Reconstruction
+
+Build sentence subtitles from words using conservative rules:
+
+1. Keep words together only when `speakerId` is the same.
+2. Split when adjacent word gaps exceed `0.45s`.
+3. Split when sentence duration would exceed `8s`.
+4. Split on strong punctuation or long pauses.
+5. Avoid returning sentences shorter than `0.6s` unless the source is actually brief.
+
+## API Design
+
+Reuse `/api/process-audio-pipeline`, but upgrade its payload to:
+
+```json
+{
+ "subtitles": [],
+ "speakers": [],
+ "sourceLanguage": "zh",
+ "targetLanguage": "en",
+ "duration": 123.45,
+ "quality": "full",
+ "alignmentEngine": "whisperx+pyannote"
+}
+```
+
+Quality levels:
+
+1. `full`: sentence timings, word timings, and diarization are all available.
+2. `partial`: word timings are available but diarization is missing or unreliable.
+3. `fallback`: high-precision alignment failed, so the app returned rough timing from the existing path.
+
+## Frontend Behavior
+
+The current editor in `src/components/EditorScreen.tsx` should evolve incrementally:
+
+1. Keep the existing sentence-based timeline as the default view.
+2. Add word-level highlighting during playback when `words` exist.
+3. Add speaker-aware styling and filtering when `speakers` exist.
+4. Preserve manual timeline editing and snap dragged sentence edges to nearest word boundaries when possible.
+
+Fallback behavior:
+
+1. If `quality` is `full`, enable all precision UI.
+2. If `quality` is `partial`, disable speaker-specific UI and keep timing features.
+3. If `quality` is `fallback`, continue with the current editor and show a low-precision notice.
+
+## Error Handling and Degradation
+
+The product must remain usable even when the high-precision path is incomplete.
+
+1. If forced alignment fails, return sentence-level ASR output instead of failing the whole request.
+2. If diarization fails, keep timings and mark `speakerId` as `unknown`.
+3. If translation fails, return original text with timings intact.
+4. If the alignment layer is unavailable, fall back to the existing visual pipeline and set `quality: "fallback"`.
+5. Preserve low-confidence words and expose their confidence rather than dropping them silently.
+
+## Testing Strategy
+
+Coverage should focus on deterministic logic:
+
+1. Sentence reconstruction from aligned words.
+2. Speaker assignment from overlapping diarization segments.
+3. API normalization and fallback handling.
+4. Frontend word-highlighting and snapping helpers.
+
+End-to-end manual verification should include:
+
+1. Single-speaker clip with pauses.
+2. Two-speaker dialogue with interruptions.
+3. Music-heavy clip.
+4. Alignment failure fallback.
+
+## Rollout Plan
+
+1. Extend types and response normalization first.
+2. Introduce the alignment adapter behind a feature flag or environment guard.
+3. Return richer payloads while keeping the current UI backward compatible.
+4. Add word-level highlighting and speaker-aware UI after the backend contract stabilizes.
+
+## Constraints and Notes
+
+1. This workspace is not a Git repository, so the required design-document commit could not be performed here.
+2. The current project does not yet include a test runner, so the implementation plan includes test infrastructure setup before feature work.
+
+## Implementation Status
+
+Implemented in this workspace:
+
+1. Test infrastructure using Vitest, jsdom, and Testing Library.
+2. Shared subtitle pipeline helpers for normalization, sentence reconstruction, speaker assignment, word highlighting, and timeline snapping.
+3. A backend subtitle orchestration layer plus an alignment-service adapter boundary for local ASR / alignment backends.
+4. Gemini-based sentence translation in the audio pipeline, without relying on OpenAI for ASR or translation.
+5. Frontend pipeline mapping, precision notices, word-level playback feedback, and speaker-aware presentation.
+
+Automated verification completed:
+
+1. `npm test -- --run`
+2. `npm run lint`
+3. `npm run build`
+
+Manual verification still pending:
+
+1. Single-speaker clip with pauses.
+2. Two-speaker dialogue with interruptions.
+3. Music-heavy clip.
+4. Alignment-service unavailable fallback using a real upload.
diff --git a/docs/plans/2026-03-17-precise-dialogue-localization.md b/docs/plans/2026-03-17-precise-dialogue-localization.md
new file mode 100644
index 0000000..de4b5b2
--- /dev/null
+++ b/docs/plans/2026-03-17-precise-dialogue-localization.md
@@ -0,0 +1,650 @@
+# Precise Dialogue Localization Implementation Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
+
+**Goal:** Build a high-precision subtitle pipeline that returns accurate sentence boundaries, word-level timings, and real speaker attribution while preserving the current editor flow.
+
+**Architecture:** Keep the React app and `server.ts` as the public entry points, but move timing-critical work into a dedicated alignment adapter. The backend normalizes aligned words into sentence subtitles, translates text without changing timing, and returns quality metadata so the editor can enable or disable precision UI safely.
+
+**Tech Stack:** React 19, TypeScript, Vite, Express, FFmpeg, OpenAI SDK, a new test runner (`vitest`), and a high-precision alignment backend adapter.
+
+---
+
+### Task 1: Add Test Infrastructure
+
+**Files:**
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\package.json`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\vitest.config.ts`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\test\setup.ts`
+
+**Step 1: Write the failing test**
+
+Create a minimal smoke test first so the test runner has a real target.
+
+```ts
+import { describe, expect, it } from 'vitest';
+
+describe('test harness', () => {
+ it('runs vitest in this workspace', () => {
+ expect(true).toBe(true);
+ });
+});
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `npm test -- --run`
+Expected: FAIL because no `test` script or Vitest config exists yet.
+
+**Step 3: Write minimal implementation**
+
+1. Add `test` and `test:watch` scripts to `package.json`.
+2. Add dev dependencies for `vitest`.
+3. Create `vitest.config.ts` with a Node environment default.
+4. Add `src/test/setup.ts` for shared setup.
+
+```ts
+import { defineConfig } from 'vitest/config';
+
+export default defineConfig({
+ test: {
+ environment: 'node',
+ setupFiles: ['./src/test/setup.ts'],
+ },
+});
+```
+
+**Step 4: Run test to verify it passes**
+
+Run: `npm test -- --run`
+Expected: PASS with the smoke test.
+
+**Step 5: Commit**
+
+```bash
+git add package.json vitest.config.ts src/test/setup.ts
+git commit -m "test: add vitest infrastructure"
+```
+
+### Task 2: Extract Subtitle Pipeline Types and Normalizers
+
+**Files:**
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\types.ts`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\subtitlePipeline.ts`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\subtitlePipeline.test.ts`
+
+**Step 1: Write the failing test**
+
+Write tests for normalization from aligned word payloads to UI-ready subtitles.
+
+```ts
+it('derives subtitle boundaries from first and last word', () => {
+ const result = normalizeAlignedSentence({
+ id: 's1',
+ speakerId: 'spk_0',
+ words: [
+ { text: 'Hello', startTime: 1.2, endTime: 1.5, speakerId: 'spk_0', confidence: 0.99 },
+ { text: 'world', startTime: 1.6, endTime: 2.0, speakerId: 'spk_0', confidence: 0.98 },
+ ],
+ originalText: 'Hello world',
+ translatedText: '你好世界',
+ });
+
+ expect(result.startTime).toBe(1.2);
+ expect(result.endTime).toBe(2.0);
+});
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `npm test -- --run src/lib/subtitlePipeline.test.ts`
+Expected: FAIL because the new module and extended types do not exist.
+
+**Step 3: Write minimal implementation**
+
+1. Extend `Subtitle` in `src/types.ts` with `speakerId`, `words`, and `confidence`.
+2. Create a pure helper module that normalizes backend payloads into frontend subtitles.
+
+```ts
+export const deriveSubtitleBounds = (words: WordTiming[]) => ({
+ startTime: words[0]?.startTime ?? 0,
+ endTime: words[words.length - 1]?.endTime ?? 0,
+});
+```
+
+**Step 4: Run test to verify it passes**
+
+Run: `npm test -- --run src/lib/subtitlePipeline.test.ts`
+Expected: PASS.
+
+**Step 5: Commit**
+
+```bash
+git add src/types.ts src/lib/subtitlePipeline.ts src/lib/subtitlePipeline.test.ts
+git commit -m "feat: add subtitle pipeline normalizers"
+```
+
+### Task 3: Implement Sentence Reconstruction Helpers
+
+**Files:**
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\alignment\sentenceReconstruction.ts`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\alignment\sentenceReconstruction.test.ts`
+
+**Step 1: Write the failing test**
+
+Cover pause splitting and speaker splitting.
+
+```ts
+it('splits sentences when speaker changes', () => {
+ const result = rebuildSentences([
+ { text: 'Hi', startTime: 0.0, endTime: 0.2, speakerId: 'spk_0', confidence: 0.9 },
+ { text: 'there', startTime: 0.25, endTime: 0.5, speakerId: 'spk_0', confidence: 0.9 },
+ { text: 'no', startTime: 0.55, endTime: 0.7, speakerId: 'spk_1', confidence: 0.9 },
+ ]);
+
+ expect(result).toHaveLength(2);
+});
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `npm test -- --run src/lib/alignment/sentenceReconstruction.test.ts`
+Expected: FAIL because the helper module is missing.
+
+**Step 3: Write minimal implementation**
+
+Implement pure splitting rules:
+
+1. Split on `speakerId` change.
+2. Split when word gaps exceed `0.45`.
+3. Split when sentence duration exceeds `8`.
+
+```ts
+if (nextWord.speakerId !== currentSpeakerId) {
+ flushSentence();
+}
+```
+
+**Step 4: Run test to verify it passes**
+
+Run: `npm test -- --run src/lib/alignment/sentenceReconstruction.test.ts`
+Expected: PASS.
+
+**Step 5: Commit**
+
+```bash
+git add src/lib/alignment/sentenceReconstruction.ts src/lib/alignment/sentenceReconstruction.test.ts
+git commit -m "feat: add sentence reconstruction rules"
+```
+
+### Task 4: Implement Speaker Assignment Helpers
+
+**Files:**
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\alignment\speakerAssignment.ts`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\lib\alignment\speakerAssignment.test.ts`
+
+**Step 1: Write the failing test**
+
+Test overlap-based speaker assignment.
+
+```ts
+it('assigns each word to the speaker segment with maximum overlap', () => {
+ const word = { text: 'hello', startTime: 1.0, endTime: 1.4 };
+ const speakers = [
+ { speakerId: 'spk_0', startTime: 0.8, endTime: 1.1 },
+ { speakerId: 'spk_1', startTime: 1.1, endTime: 1.6 },
+ ];
+
+ expect(assignSpeakerToWord(word, speakers)).toBe('spk_1');
+});
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `npm test -- --run src/lib/alignment/speakerAssignment.test.ts`
+Expected: FAIL because speaker assignment logic does not exist.
+
+**Step 3: Write minimal implementation**
+
+Add a pure overlap calculator and default to `unknown` when no segment overlaps.
+
+```ts
+const overlap = Math.max(
+ 0,
+ Math.min(word.endTime, segment.endTime) - Math.max(word.startTime, segment.startTime),
+);
+```
+
+**Step 4: Run test to verify it passes**
+
+Run: `npm test -- --run src/lib/alignment/speakerAssignment.test.ts`
+Expected: PASS.
+
+**Step 5: Commit**
+
+```bash
+git add src/lib/alignment/speakerAssignment.ts src/lib/alignment/speakerAssignment.test.ts
+git commit -m "feat: add speaker assignment helpers"
+```
+
+### Task 5: Isolate Backend Pipeline Logic from `server.ts`
+
+**Files:**
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitlePipeline.ts`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitlePipeline.test.ts`
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\server.ts`
+
+**Step 1: Write the failing test**
+
+Add tests for orchestration-level fallback behavior.
+
+```ts
+it('returns partial quality when diarization is unavailable', async () => {
+ const result = await buildSubtitlePayload({
+ alignmentResult: {
+ words: [{ text: 'hi', startTime: 0, endTime: 0.2, speakerId: 'unknown', confidence: 0.9 }],
+ speakerSegments: [],
+ quality: 'partial',
+ },
+ });
+
+ expect(result.quality).toBe('partial');
+});
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `npm test -- --run src/server/subtitlePipeline.test.ts`
+Expected: FAIL because orchestration code is still embedded in `server.ts`.
+
+**Step 3: Write minimal implementation**
+
+1. Move payload-building logic into `src/server/subtitlePipeline.ts`.
+2. Make `server.ts` call the helper and only handle HTTP concerns.
+
+```ts
+export const buildSubtitlePayload = async (deps: SubtitlePipelineDeps) => {
+ // normalize alignment result
+ // translate text
+ // return { subtitles, speakers, quality, ... }
+};
+```
+
+**Step 4: Run test to verify it passes**
+
+Run: `npm test -- --run src/server/subtitlePipeline.test.ts`
+Expected: PASS.
+
+**Step 5: Commit**
+
+```bash
+git add src/server/subtitlePipeline.ts src/server/subtitlePipeline.test.ts server.ts
+git commit -m "refactor: isolate subtitle pipeline orchestration"
+```
+
+### Task 6: Add an Alignment Service Adapter
+
+**Files:**
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\alignmentAdapter.ts`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\alignmentAdapter.test.ts`
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\server.ts`
+
+**Step 1: Write the failing test**
+
+Test that the adapter maps raw alignment responses into normalized internal types.
+
+```ts
+it('maps aligned words and speaker segments from the adapter response', async () => {
+ const result = await parseAlignmentResponse({
+ words: [{ word: 'hello', start: 1.0, end: 1.2, speaker: 'spk_0', score: 0.95 }],
+ speakers: [{ speaker: 'spk_0', start: 0.8, end: 1.6 }],
+ });
+
+ expect(result.words[0].speakerId).toBe('spk_0');
+});
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `npm test -- --run src/server/alignmentAdapter.test.ts`
+Expected: FAIL because no adapter exists.
+
+**Step 3: Write minimal implementation**
+
+Create an adapter boundary with one public function such as `requestAlignedTranscript(audioPath)`.
+
+```ts
+export const requestAlignedTranscript = async (audioPath: string) => {
+ // call local or remote alignment backend
+ // normalize response shape
+};
+```
+
+**Step 4: Run test to verify it passes**
+
+Run: `npm test -- --run src/server/alignmentAdapter.test.ts`
+Expected: PASS.
+
+**Step 5: Commit**
+
+```bash
+git add src/server/alignmentAdapter.ts src/server/alignmentAdapter.test.ts server.ts
+git commit -m "feat: add alignment service adapter"
+```
+
+### Task 7: Upgrade `/api/process-audio-pipeline` Response Shape
+
+**Files:**
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\server.ts`
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\services\geminiService.ts`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\services\geminiService.test.ts`
+
+**Step 1: Write the failing test**
+
+Add a client-side test for parsing `quality`, `speakers`, and `words`.
+
+```ts
+it('maps the enriched audio pipeline response into subtitle objects', async () => {
+ const payload = {
+ subtitles: [
+ {
+ id: 'sub_1',
+ startTime: 1,
+ endTime: 2,
+ originalText: 'Hello',
+ translatedText: '你好',
+ speaker: 'Speaker 1',
+ speakerId: 'spk_0',
+ words: [{ text: 'Hello', startTime: 1, endTime: 2, speakerId: 'spk_0', confidence: 0.9 }],
+ confidence: 0.9,
+ },
+ ],
+ speakers: [{ speakerId: 'spk_0', label: 'Speaker 1' }],
+ quality: 'full',
+ };
+
+ expect(mapPipelineResponse(payload).subtitles[0].words).toHaveLength(1);
+});
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `npm test -- --run src/services/geminiService.test.ts`
+Expected: FAIL because the mapping helper does not exist.
+
+**Step 3: Write minimal implementation**
+
+1. Add a response-mapping helper in `src/services/geminiService.ts`.
+2. Preserve the existing fallback path.
+3. Carry `quality` metadata to the UI.
+
+```ts
+const quality = data.quality ?? 'fallback';
+const subtitles = (data.subtitles ?? []).map(mapSubtitleFromApi);
+```
+
+**Step 4: Run test to verify it passes**
+
+Run: `npm test -- --run src/services/geminiService.test.ts`
+Expected: PASS.
+
+**Step 5: Commit**
+
+```bash
+git add server.ts src/services/geminiService.ts src/services/geminiService.test.ts
+git commit -m "feat: return enriched subtitle pipeline payloads"
+```
+
+### Task 8: Add Precision Metadata to Editor State
+
+**Files:**
+- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\components\EditorScreen.tsx`
+- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\components\EditorScreen.test.tsx`
+
+**Step 1: Write the failing test**
+
+Add a test for rendering a fallback warning when `quality` is low.
+
+```tsx
+it('shows a low-precision notice for fallback subtitle results', () => {
+ render();
+ expect(screen.getByText(/low-precision/i)).toBeInTheDocument();
+});
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `npm test -- --run src/components/EditorScreen.test.tsx`
+Expected: FAIL because the component does not track pipeline quality yet.
+
+**Step 3: Write minimal implementation**
+
+1. Add state for `quality` and `speakers`.
+2. Surface a small status badge or warning banner.
+3. Keep the existing sentence list and timeline intact.
+
+```tsx
+{quality === 'fallback' && (
+