video_translate/docs/plans/2026-03-17-doubao-llm-provider.md
2026-03-18 11:42:00 +08:00

16 KiB

Doubao LLM Provider Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Add a user-visible LLM switcher that lets subtitle generation use Doubao or Gemini, defaults to Doubao, and keeps TTS fixed on MiniMax with all provider keys sourced from .env.

Architecture: Move subtitle generation behind a new server endpoint, introduce a provider abstraction for Gemini and Doubao, and update the editor to send the selected provider while continuing to use the existing subtitle shape. Keep MiniMax TTS separate and untouched except for regression coverage.

Tech Stack: React, TypeScript, Express, multer, fetch, Vitest


Task 1: Add provider types and configuration resolution

Files:

  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\llmProvider.ts
  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\llmProvider.test.ts
  • Modify: E:\Downloads\ai-video-dubbing-&-translation\src\server\audioPipelineConfig.ts
  • Modify: E:\Downloads\ai-video-dubbing-&-translation\src\server\audioPipelineConfig.test.ts

Step 1: Write the failing test

import { describe, expect, it } from 'vitest';
import { normalizeLlmProvider, resolveLlmProviderConfig } from './llmProvider';

describe('llmProvider config', () => {
  it('defaults to doubao when no provider override is set', () => {
    expect(normalizeLlmProvider(undefined)).toBe('doubao');
  });

  it('returns the selected provider key from env', () => {
    expect(
      resolveLlmProviderConfig('doubao', {
        ARK_API_KEY: 'ark-key',
        GEMINI_API_KEY: 'gemini-key',
      }),
    ).toEqual(expect.objectContaining({ provider: 'doubao', apiKey: 'ark-key' }));
  });
});

Step 2: Run test to verify it fails

Run: node .\node_modules\vitest\vitest.mjs run src/server/llmProvider.test.ts src/server/audioPipelineConfig.test.ts Expected: FAIL because llmProvider.ts does not exist and audioPipelineConfig.ts still only exposes Gemini config.

Step 3: Write minimal implementation

export type LlmProvider = 'doubao' | 'gemini';

export const normalizeLlmProvider = (value?: string): LlmProvider =>
  value?.toLowerCase() === 'gemini' ? 'gemini' : 'doubao';

export const resolveLlmProviderConfig = (
  provider: LlmProvider,
  env: NodeJS.ProcessEnv,
) => {
  if (provider === 'doubao') {
    const apiKey = env.ARK_API_KEY?.trim();
    if (!apiKey) throw new Error('ARK_API_KEY is required for Doubao subtitle generation.');
    return {
      provider,
      apiKey,
      model: env.DOUBAO_MODEL?.trim() || 'doubao-seed-2-0-pro-260215',
      baseUrl: 'https://ark.cn-beijing.volces.com/api/v3/responses',
    };
  }

  const apiKey = env.GEMINI_API_KEY?.trim();
  if (!apiKey) throw new Error('GEMINI_API_KEY is required for Gemini subtitle generation.');
  return {
    provider,
    apiKey,
    model: 'gemini-2.5-flash',
  };
};

Step 4: Run test to verify it passes

Run: node .\node_modules\vitest\vitest.mjs run src/server/llmProvider.test.ts src/server/audioPipelineConfig.test.ts Expected: PASS

Step 5: Commit

git add src/server/llmProvider.ts src/server/llmProvider.test.ts src/server/audioPipelineConfig.ts src/server/audioPipelineConfig.test.ts
git commit -m "feat: add llm provider configuration"

Task 2: Add the Doubao provider parser and contract tests

Files:

  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\doubaoProvider.ts
  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\doubaoProvider.test.ts

Step 1: Write the failing test

import { describe, expect, it } from 'vitest';
import { extractDoubaoTextOutput } from './doubaoProvider';

describe('extractDoubaoTextOutput', () => {
  it('reconstructs text from the Ark output array', () => {
    const text = extractDoubaoTextOutput({
      output: [
        {
          type: 'message',
          content: [{ type: 'output_text', text: '[{"id":"1","translatedText":"你好"}]' }],
        },
      ],
    });

    expect(text).toContain('translatedText');
  });
});

Step 2: Run test to verify it fails

Run: node .\node_modules\vitest\vitest.mjs run src/server/doubaoProvider.test.ts Expected: FAIL because doubaoProvider.ts does not exist.

Step 3: Write minimal implementation

export const extractDoubaoTextOutput = (payload: any): string =>
  (payload?.output ?? [])
    .flatMap((item: any) => item?.content ?? [])
    .filter((part: any) => part?.type === 'output_text')
    .map((part: any) => part.text ?? '')
    .join('')
    .trim();

Step 4: Run test to verify it passes

Run: node .\node_modules\vitest\vitest.mjs run src/server/doubaoProvider.test.ts Expected: PASS

Step 5: Commit

git add src/server/doubaoProvider.ts src/server/doubaoProvider.test.ts
git commit -m "feat: add doubao response parsing"

Task 3: Add provider-backed translation adapters

Files:

  • Modify: E:\Downloads\ai-video-dubbing-&-translation\src\server\geminiTranslation.ts
  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\providerTranslation.ts
  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\providerTranslation.test.ts
  • Test: E:\Downloads\ai-video-dubbing-&-translation\src\server\geminiTranslation.test.ts

Step 1: Write the failing test

import { describe, expect, it } from 'vitest';
import { createSentenceTranslator } from './providerTranslation';

describe('createSentenceTranslator', () => {
  it('returns a Doubao translator when provider is doubao', () => {
    const translator = createSentenceTranslator({
      provider: 'doubao',
      apiKey: 'ark-key',
      model: 'doubao-seed-2-0-pro-260215',
    });

    expect(typeof translator).toBe('function');
  });
});

Step 2: Run test to verify it fails

Run: node .\node_modules\vitest\vitest.mjs run src/server/providerTranslation.test.ts src/server/geminiTranslation.test.ts Expected: FAIL because the provider selection layer does not exist.

Step 3: Write minimal implementation

export const createSentenceTranslator = (config: ProviderConfig) => {
  if (config.provider === 'doubao') {
    return createDoubaoSentenceTranslator(config);
  }
  return createGeminiSentenceTranslator(config);
};

Step 4: Run test to verify it passes

Run: node .\node_modules\vitest\vitest.mjs run src/server/providerTranslation.test.ts src/server/geminiTranslation.test.ts Expected: PASS

Step 5: Commit

git add src/server/providerTranslation.ts src/server/providerTranslation.test.ts src/server/geminiTranslation.ts src/server/geminiTranslation.test.ts
git commit -m "feat: add provider-based translation adapters"

Task 4: Add a dedicated subtitle-generation endpoint

Files:

  • Modify: E:\Downloads\ai-video-dubbing-&-translation\server.ts
  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleRequest.ts
  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitleRequest.test.ts
  • Test: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitlePipeline.test.ts

Step 1: Write the failing test

import { describe, expect, it } from 'vitest';
import { parseSubtitleRequest } from './subtitleRequest';

describe('parseSubtitleRequest', () => {
  it('defaults provider to doubao', () => {
    expect(parseSubtitleRequest({ body: {} as any }).provider).toBe('doubao');
  });
});

Step 2: Run test to verify it fails

Run: node .\node_modules\vitest\vitest.mjs run src/server/subtitleRequest.test.ts src/server/subtitlePipeline.test.ts Expected: FAIL because the request parser does not exist.

Step 3: Write minimal implementation

export const parseSubtitleRequest = (req: { body: Record<string, unknown> }) => ({
  provider: normalizeLlmProvider(String(req.body.provider || 'doubao')),
  targetLanguage: String(req.body.targetLanguage || ''),
});

Then update server.ts to expose POST /api/generate-subtitles, validate input, resolve provider config, and return normalized subtitles.

Step 4: Run test to verify it passes

Run: node .\node_modules\vitest\vitest.mjs run src/server/subtitleRequest.test.ts src/server/subtitlePipeline.test.ts Expected: PASS

Step 5: Commit

git add server.ts src/server/subtitleRequest.ts src/server/subtitleRequest.test.ts src/server/subtitlePipeline.test.ts
git commit -m "feat: add subtitle generation endpoint"

Task 5: Update the frontend subtitle service to use the new endpoint

Files:

  • Modify: E:\Downloads\ai-video-dubbing-&-translation\src\services\geminiService.ts
  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\services\subtitleService.ts
  • Create: E:\Downloads\ai-video-dubbing-&-translation\src\services\subtitleService.test.ts
  • Test: E:\Downloads\ai-video-dubbing-&-translation\src\components\EditorScreen.test.tsx

Step 1: Write the failing test

import { describe, expect, it, vi } from 'vitest';
import { generateSubtitles } from './subtitleService';

describe('generateSubtitles', () => {
  it('posts the selected provider to the server', async () => {
    const fetchMock = vi.fn(async () => ({
      ok: true,
      json: async () => ({ subtitles: [] }),
    }));

    await generateSubtitles(new File(['x'], 'clip.mp4'), 'English', 'doubao', null, fetchMock as any);

    expect(fetchMock.mock.calls[0][0]).toBe('/api/generate-subtitles');
  });
});

Step 2: Run test to verify it fails

Run: node .\node_modules\vitest\vitest.mjs run src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx Expected: FAIL because the new service does not exist and the editor still uses the Gemini-specific service directly.

Step 3: Write minimal implementation

export const generateSubtitles = async (
  videoFile: File,
  targetLanguage: string,
  provider: 'doubao' | 'gemini',
  trimRange?: { start: number; end: number } | null,
  fetchImpl: typeof fetch = fetch,
) => {
  const formData = new FormData();
  formData.append('video', videoFile);
  formData.append('targetLanguage', targetLanguage);
  formData.append('provider', provider);
  if (trimRange) {
    formData.append('trimRange', JSON.stringify(trimRange));
  }

  const response = await fetchImpl('/api/generate-subtitles', {
    method: 'POST',
    body: formData,
  });

  return response.json();
};

Step 4: Run test to verify it passes

Run: node .\node_modules\vitest\vitest.mjs run src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx Expected: PASS

Step 5: Commit

git add src/services/subtitleService.ts src/services/subtitleService.test.ts src/services/geminiService.ts src/components/EditorScreen.test.tsx
git commit -m "feat: route subtitle generation through the server"

Task 6: Add the editor LLM selector and default it to Doubao

Files:

  • Modify: E:\Downloads\ai-video-dubbing-&-translation\src\components\EditorScreen.tsx
  • Modify: E:\Downloads\ai-video-dubbing-&-translation\src\components\EditorScreen.test.tsx

Step 1: Write the failing test

it('defaults the llm selector to Doubao', () => {
  render(<EditorScreen videoFile={file} targetLanguage="English" onBack={() => {}} />);
  expect(screen.getByLabelText(/llm/i)).toHaveValue('doubao');
});

Step 2: Run test to verify it fails

Run: node .\node_modules\vitest\vitest.mjs run src/components/EditorScreen.test.tsx Expected: FAIL because the selector does not exist.

Step 3: Write minimal implementation

const [llmProvider, setLlmProvider] = useState<'doubao' | 'gemini'>('doubao');

<label>
  LLM
  <select
    aria-label="LLM"
    value={llmProvider}
    onChange={(event) => setLlmProvider(event.target.value as 'doubao' | 'gemini')}
  >
    <option value="doubao">Doubao</option>
    <option value="gemini">Gemini</option>
  </select>
</label>

Then pass llmProvider into the subtitle-generation service.

Step 4: Run test to verify it passes

Run: node .\node_modules\vitest\vitest.mjs run src/components/EditorScreen.test.tsx Expected: PASS

Step 5: Commit

git add src/components/EditorScreen.tsx src/components/EditorScreen.test.tsx
git commit -m "feat: add llm selector to the editor"

Task 7: Add end-to-end provider and regression coverage

Files:

  • Modify: E:\Downloads\ai-video-dubbing-&-translation\src\server\subtitlePipeline.test.ts
  • Modify: E:\Downloads\ai-video-dubbing-&-translation\src\services\geminiService.test.ts
  • Modify: E:\Downloads\ai-video-dubbing-&-translation\src\server\minimaxTts.test.ts

Step 1: Write the failing test

it('does not change TTS behavior when the llm provider changes', async () => {
  expect(true).toBe(true);
});

Step 2: Run test to verify it fails meaningfully

Run: node .\node_modules\vitest\vitest.mjs run src/server/subtitlePipeline.test.ts src/services/geminiService.test.ts src/server/minimaxTts.test.ts Expected: FAIL or require stronger assertions until the new provider path is covered.

Step 3: Write minimal implementation

Add regression tests that prove:

  1. selected provider is forwarded correctly
  2. Doubao auth failures surface clearly
  3. Gemini still works when selected
  4. MiniMax TTS tests continue to pass unchanged

Step 4: Run test to verify it passes

Run: node .\node_modules\vitest\vitest.mjs run src/server/llmProvider.test.ts src/server/doubaoProvider.test.ts src/server/providerTranslation.test.ts src/server/subtitleRequest.test.ts src/server/subtitlePipeline.test.ts src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx src/server/minimaxTts.test.ts src/services/geminiService.test.ts Expected: PASS

Step 5: Commit

git add src/server/llmProvider.test.ts src/server/doubaoProvider.test.ts src/server/providerTranslation.test.ts src/server/subtitleRequest.test.ts src/server/subtitlePipeline.test.ts src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx src/server/minimaxTts.test.ts src/services/geminiService.test.ts
git commit -m "test: cover llm provider switching"

Task 8: Verify the live app behavior

Files:

  • Modify: E:\Downloads\ai-video-dubbing-&-translation\.env.example
  • Modify: E:\Downloads\ai-video-dubbing-&-translation\README.md

Step 1: Write the failing doc check

Add docs assertions by inspection:

  1. .env.example documents ARK_API_KEY and optional DOUBAO_MODEL
  2. README explains the editor LLM switcher and that MiniMax remains the TTS engine

Step 2: Run verification commands

Run: node .\node_modules\vitest\vitest.mjs run Expected: PASS for the new targeted suites or clear identification of pre-existing unrelated failures.

Run: Invoke-WebRequest -UseBasicParsing http://localhost:3000/ Expected: 200

Run manual checks:

  1. open the editor
  2. confirm the LLM selector defaults to Doubao
  3. generate subtitles with Doubao
  4. switch to Gemini
  5. generate subtitles again
  6. confirm TTS still uses MiniMax

Step 3: Write minimal documentation updates

Document:

  1. required env keys
  2. default provider
  3. how the editor switcher works

Step 4: Re-run verification

Run: node .\node_modules\vitest\vitest.mjs run src/server/llmProvider.test.ts src/server/doubaoProvider.test.ts src/server/providerTranslation.test.ts src/server/subtitleRequest.test.ts src/services/subtitleService.test.ts src/components/EditorScreen.test.tsx src/server/minimaxTts.test.ts Expected: PASS

Step 5: Commit

git add .env.example README.md
git commit -m "docs: document llm provider switching"

Notes

  1. This workspace is not a Git repository, so the commit steps may not be executable here.
  2. Existing unrelated TypeScript baseline issues in src/lib/* and src/server/* should be treated as pre-existing unless the new work touches them directly.