video_translate/docs/plans/2026-03-17-doubao-llm-provider.md
2026-03-18 11:42:00 +08:00

473 lines
16 KiB
Markdown

# Doubao LLM Provider Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Add a user-visible LLM switcher that lets subtitle generation use Doubao or Gemini, defaults to Doubao, and keeps TTS fixed on MiniMax with all provider keys sourced from `.env`.
**Architecture:** Move subtitle generation behind a new server endpoint, introduce a provider abstraction for Gemini and Doubao, and update the editor to send the selected provider while continuing to use the existing subtitle shape. Keep MiniMax TTS separate and untouched except for regression coverage.
**Tech Stack:** React, TypeScript, Express, multer, fetch, Vitest
---
### Task 1: Add provider types and configuration resolution
**Files:**
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\llmProvider.ts`
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\llmProvider.test.ts`
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\audioPipelineConfig.ts`
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\audioPipelineConfig.test.ts`
**Step 1: Write the failing test**
```ts
import { describe, expect, it } from 'vitest';
import { normalizeLlmProvider, resolveLlmProviderConfig } from './llmProvider';
describe('llmProvider config', () => {
it('defaults to doubao when no provider override is set', () => {
expect(normalizeLlmProvider(undefined)).toBe('doubao');
});
it('returns the selected provider key from env', () => {
expect(
resolveLlmProviderConfig('doubao', {
ARK_API_KEY: 'ark-key',
GEMINI_API_KEY: 'gemini-key',
}),
).toEqual(expect.objectContaining({ provider: 'doubao', apiKey: 'ark-key' }));
});
});
```
**Step 2: Run test to verify it fails**
Run: `node .\node_modules\vitest\vitest.mjs run src/server/llmProvider.test.ts src/server/audioPipelineConfig.test.ts`
Expected: FAIL because `llmProvider.ts` does not exist and `audioPipelineConfig.ts` still only exposes Gemini config.
**Step 3: Write minimal implementation**
```ts
export type LlmProvider = 'doubao' | 'gemini';
export const normalizeLlmProvider = (value?: string): LlmProvider =>
value?.toLowerCase() === 'gemini' ? 'gemini' : 'doubao';
export const resolveLlmProviderConfig = (
provider: LlmProvider,
env: NodeJS.ProcessEnv,
) => {
if (provider === 'doubao') {
const apiKey = env.ARK_API_KEY?.trim();
if (!apiKey) throw new Error('ARK_API_KEY is required for Doubao subtitle generation.');
return {
provider,
apiKey,
model: env.DOUBAO_MODEL?.trim() || 'doubao-seed-2-0-pro-260215',
baseUrl: 'https://ark.cn-beijing.volces.com/api/v3/responses',
};
}
const apiKey = env.GEMINI_API_KEY?.trim();
if (!apiKey) throw new Error('GEMINI_API_KEY is required for Gemini subtitle generation.');
return {
provider,
apiKey,
model: 'gemini-2.5-flash',
};
};
```
**Step 4: Run test to verify it passes**
Run: `node .\node_modules\vitest\vitest.mjs run src/server/llmProvider.test.ts src/server/audioPipelineConfig.test.ts`
Expected: PASS
**Step 5: Commit**
```bash
git add src/server/llmProvider.ts src/server/llmProvider.test.ts src/server/audioPipelineConfig.ts src/server/audioPipelineConfig.test.ts
git commit -m "feat: add llm provider configuration"
```
### Task 2: Add the Doubao provider parser and contract tests
**Files:**
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\doubaoProvider.ts`
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\doubaoProvider.test.ts`
**Step 1: Write the failing test**
```ts
import { describe, expect, it } from 'vitest';
import { extractDoubaoTextOutput } from './doubaoProvider';
describe('extractDoubaoTextOutput', () => {
it('reconstructs text from the Ark output array', () => {
const text = extractDoubaoTextOutput({
output: [
{
type: 'message',
content: [{ type: 'output_text', text: '[{"id":"1","translatedText":"你好"}]' }],
},
],
});
expect(text).toContain('translatedText');
});
});
```
**Step 2: Run test to verify it fails**
Run: `node .\node_modules\vitest\vitest.mjs run src/server/doubaoProvider.test.ts`
Expected: FAIL because `doubaoProvider.ts` does not exist.
**Step 3: Write minimal implementation**
```ts
export const extractDoubaoTextOutput = (payload: any): string =>
(payload?.output ?? [])
.flatMap((item: any) => item?.content ?? [])
.filter((part: any) => part?.type === 'output_text')
.map((part: any) => part.text ?? '')
.join('')
.trim();
```
**Step 4: Run test to verify it passes**
Run: `node .\node_modules\vitest\vitest.mjs run src/server/doubaoProvider.test.ts`
Expected: PASS
**Step 5: Commit**
```bash
git add src/server/doubaoProvider.ts src/server/doubaoProvider.test.ts
git commit -m "feat: add doubao response parsing"
```
### Task 3: Add provider-backed translation adapters
**Files:**
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\geminiTranslation.ts`
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\providerTranslation.ts`
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\providerTranslation.test.ts`
- Test: `E:\Downloads\ai-video-dubbing-&-translation\src\server\geminiTranslation.test.ts`
**Step 1: Write the failing test**
```ts
import { describe, expect, it } from 'vitest';
import { createSentenceTranslator } from './providerTranslation';
describe('createSentenceTranslator', () => {
it('returns a Doubao translator when provider is doubao', () => {
const translator = createSentenceTranslator({
provider: 'doubao',
apiKey: 'ark-key',
model: 'doubao-seed-2-0-pro-260215',
});
expect(typeof translator).toBe('function');
});
});
```
**Step 2: Run test to verify it fails**
Run: `node .\node_modules\vitest\vitest.mjs run src/server/providerTranslation.test.ts src/server/geminiTranslation.test.ts`
Expected: FAIL because the provider selection layer does not exist.
**Step 3: Write minimal implementation**
```ts
export const createSentenceTranslator = (config: ProviderConfig) => {
if (config.provider === 'doubao') {
return createDoubaoSentenceTranslator(config);
}
return createGeminiSentenceTranslator(config);
};
```
**Step 4: Run test to verify it passes**
Run: `node .\node_modules\vitest\vitest.mjs run src/server/providerTranslation.test.ts src/server/geminiTranslation.test.ts`
Expected: PASS
**Step 5: Commit**
```bash
git add src/server/providerTranslation.ts src/server/providerTranslation.test.ts src/server/geminiTranslation.ts src/server/geminiTranslation.test.ts
git commit -m "feat: add provider-based translation adapters"
```
### Task 4: Add a dedicated subtitle-generation endpoint
**Files:**
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\server.ts`
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleRequest.ts`
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleRequest.test.ts`
- Test: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitlePipeline.test.ts`
**Step 1: Write the failing test**
```ts
import { describe, expect, it } from 'vitest';
import { parseSubtitleRequest } from './subtitleRequest';
describe('parseSubtitleRequest', () => {
it('defaults provider to doubao', () => {
expect(parseSubtitleRequest({ body: {} as any }).provider).toBe('doubao');
});
});
```
**Step 2: Run test to verify it fails**
Run: `node .\node_modules\vitest\vitest.mjs run src/server/subtitleRequest.test.ts src/server/subtitlePipeline.test.ts`
Expected: FAIL because the request parser does not exist.
**Step 3: Write minimal implementation**
```ts
export const parseSubtitleRequest = (req: { body: Record<string, unknown> }) => ({
provider: normalizeLlmProvider(String(req.body.provider || 'doubao')),
targetLanguage: String(req.body.targetLanguage || ''),
});
```
Then update `server.ts` to expose `POST /api/generate-subtitles`, validate input, resolve provider config, and return normalized subtitles.
**Step 4: Run test to verify it passes**
Run: `node .\node_modules\vitest\vitest.mjs run src/server/subtitleRequest.test.ts src/server/subtitlePipeline.test.ts`
Expected: PASS
**Step 5: Commit**
```bash
git add server.ts src/server/subtitleRequest.ts src/server/subtitleRequest.test.ts src/server/subtitlePipeline.test.ts
git commit -m "feat: add subtitle generation endpoint"
```
### Task 5: Update the frontend subtitle service to use the new endpoint
**Files:**
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\services\geminiService.ts`
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\services\subtitleService.ts`
- Create: `E:\Downloads\ai-video-dubbing-&-translation\src\services\subtitleService.test.ts`
- Test: `E:\Downloads\ai-video-dubbing-&-translation\src\components\EditorScreen.test.tsx`
**Step 1: Write the failing test**
```ts
import { describe, expect, it, vi } from 'vitest';
import { generateSubtitles } from './subtitleService';
describe('generateSubtitles', () => {
it('posts the selected provider to the server', async () => {
const fetchMock = vi.fn(async () => ({
ok: true,
json: async () => ({ subtitles: [] }),
}));
await generateSubtitles(new File(['x'], 'clip.mp4'), 'English', 'doubao', null, fetchMock as any);
expect(fetchMock.mock.calls[0][0]).toBe('/api/generate-subtitles');
});
});
```
**Step 2: Run test to verify it fails**
Run: `node .\node_modules\vitest\vitest.mjs run src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx`
Expected: FAIL because the new service does not exist and the editor still uses the Gemini-specific service directly.
**Step 3: Write minimal implementation**
```ts
export const generateSubtitles = async (
videoFile: File,
targetLanguage: string,
provider: 'doubao' | 'gemini',
trimRange?: { start: number; end: number } | null,
fetchImpl: typeof fetch = fetch,
) => {
const formData = new FormData();
formData.append('video', videoFile);
formData.append('targetLanguage', targetLanguage);
formData.append('provider', provider);
if (trimRange) {
formData.append('trimRange', JSON.stringify(trimRange));
}
const response = await fetchImpl('/api/generate-subtitles', {
method: 'POST',
body: formData,
});
return response.json();
};
```
**Step 4: Run test to verify it passes**
Run: `node .\node_modules\vitest\vitest.mjs run src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx`
Expected: PASS
**Step 5: Commit**
```bash
git add src/services/subtitleService.ts src/services/subtitleService.test.ts src/services/geminiService.ts src/components/EditorScreen.test.tsx
git commit -m "feat: route subtitle generation through the server"
```
### Task 6: Add the editor LLM selector and default it to Doubao
**Files:**
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\components\EditorScreen.tsx`
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\components\EditorScreen.test.tsx`
**Step 1: Write the failing test**
```tsx
it('defaults the llm selector to Doubao', () => {
render(<EditorScreen videoFile={file} targetLanguage="English" onBack={() => {}} />);
expect(screen.getByLabelText(/llm/i)).toHaveValue('doubao');
});
```
**Step 2: Run test to verify it fails**
Run: `node .\node_modules\vitest\vitest.mjs run src/components/EditorScreen.test.tsx`
Expected: FAIL because the selector does not exist.
**Step 3: Write minimal implementation**
```tsx
const [llmProvider, setLlmProvider] = useState<'doubao' | 'gemini'>('doubao');
<label>
LLM
<select
aria-label="LLM"
value={llmProvider}
onChange={(event) => setLlmProvider(event.target.value as 'doubao' | 'gemini')}
>
<option value="doubao">Doubao</option>
<option value="gemini">Gemini</option>
</select>
</label>
```
Then pass `llmProvider` into the subtitle-generation service.
**Step 4: Run test to verify it passes**
Run: `node .\node_modules\vitest\vitest.mjs run src/components/EditorScreen.test.tsx`
Expected: PASS
**Step 5: Commit**
```bash
git add src/components/EditorScreen.tsx src/components/EditorScreen.test.tsx
git commit -m "feat: add llm selector to the editor"
```
### Task 7: Add end-to-end provider and regression coverage
**Files:**
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitlePipeline.test.ts`
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\services\geminiService.test.ts`
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\src\server\minimaxTts.test.ts`
**Step 1: Write the failing test**
```ts
it('does not change TTS behavior when the llm provider changes', async () => {
expect(true).toBe(true);
});
```
**Step 2: Run test to verify it fails meaningfully**
Run: `node .\node_modules\vitest\vitest.mjs run src/server/subtitlePipeline.test.ts src/services/geminiService.test.ts src/server/minimaxTts.test.ts`
Expected: FAIL or require stronger assertions until the new provider path is covered.
**Step 3: Write minimal implementation**
Add regression tests that prove:
1. selected provider is forwarded correctly
2. Doubao auth failures surface clearly
3. Gemini still works when selected
4. MiniMax TTS tests continue to pass unchanged
**Step 4: Run test to verify it passes**
Run: `node .\node_modules\vitest\vitest.mjs run src/server/llmProvider.test.ts src/server/doubaoProvider.test.ts src/server/providerTranslation.test.ts src/server/subtitleRequest.test.ts src/server/subtitlePipeline.test.ts src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx src/server/minimaxTts.test.ts src/services/geminiService.test.ts`
Expected: PASS
**Step 5: Commit**
```bash
git add src/server/llmProvider.test.ts src/server/doubaoProvider.test.ts src/server/providerTranslation.test.ts src/server/subtitleRequest.test.ts src/server/subtitlePipeline.test.ts src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx src/server/minimaxTts.test.ts src/services/geminiService.test.ts
git commit -m "test: cover llm provider switching"
```
### Task 8: Verify the live app behavior
**Files:**
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\.env.example`
- Modify: `E:\Downloads\ai-video-dubbing-&-translation\README.md`
**Step 1: Write the failing doc check**
Add docs assertions by inspection:
1. `.env.example` documents `ARK_API_KEY` and optional `DOUBAO_MODEL`
2. README explains the editor LLM switcher and that MiniMax remains the TTS engine
**Step 2: Run verification commands**
Run: `node .\node_modules\vitest\vitest.mjs run`
Expected: PASS for the new targeted suites or clear identification of pre-existing unrelated failures.
Run: `Invoke-WebRequest -UseBasicParsing http://localhost:3000/`
Expected: `200`
Run manual checks:
1. open the editor
2. confirm the `LLM` selector defaults to `Doubao`
3. generate subtitles with `Doubao`
4. switch to `Gemini`
5. generate subtitles again
6. confirm TTS still uses MiniMax
**Step 3: Write minimal documentation updates**
Document:
1. required env keys
2. default provider
3. how the editor switcher works
**Step 4: Re-run verification**
Run: `node .\node_modules\vitest\vitest.mjs run src/server/llmProvider.test.ts src/server/doubaoProvider.test.ts src/server/providerTranslation.test.ts src/server/subtitleRequest.test.ts src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx src/server/minimaxTts.test.ts`
Expected: PASS
**Step 5: Commit**
```bash
git add .env.example README.md
git commit -m "docs: document llm provider switching"
```
## Notes
1. This workspace is not a Git repository, so the commit steps may not be executable here.
2. Existing unrelated TypeScript baseline issues in `src/lib/*` and `src/server/*` should be treated as pre-existing unless the new work touches them directly.