video_translate/docs/plans/2026-03-19-doubao-file-id-frontend-design.md
Song367 a0c1dc6ad5
All checks were successful
Gitea Actions Demo / Explore-Gitea-Actions (push) Successful in 27s
文件上传
2026-03-19 11:17:10 +08:00

5.4 KiB

Doubao Frontend File ID Upload Design

Goal: Let the browser upload videos to Volcengine Ark Files API, then send the returned file_id to this app's backend so Doubao subtitle generation can use Responses API with file_id instead of inline base64 video payloads.

Context

The current subtitle flow uploads the full video to this app's backend, then the backend reads the file and sends a data:video/mp4;base64,... payload to Doubao. That works for smaller files, but it inherits request body size limits and repeats the full video upload on every subtitle generation request.

The user wants a staged flow:

  1. Frontend uploads the selected video directly to Ark Files API.
  2. Frontend receives a file_id.
  3. Frontend calls this app's /api/generate-subtitles endpoint with that file_id.
  4. Backend keeps ownership of the Doubao Responses API request, logging, normalization, and subtitle result shaping.

Approaches Considered

Option A: Frontend uploads to Files API, backend uses file_id for Doubao

This keeps the current app architecture mostly intact. Only the upload stage moves to the browser. The backend still handles provider selection, subtitle parsing, error mapping, and normalized response shaping.

Pros

  • Smallest architectural change
  • Keeps existing backend logging and response normalization
  • Preserves the existing /api/generate-subtitles contract with a backward-compatible extension
  • Allows a gradual rollout because base64 upload can remain as fallback

Cons

  • Frontend gains Ark-specific upload logic
  • The browser now coordinates two network calls for Doubao

Option B: Frontend uploads to Files API and also calls Doubao Responses API

This removes backend involvement for Doubao subtitle generation, but it pushes subtitle parsing and normalization into the browser.

Pros

  • Shorter network path for Doubao

Cons

  • Large frontend refactor
  • Duplicates provider logic across frontend and backend
  • Loses centralized logging and error handling
  • Makes Gemini and Doubao flows diverge more sharply

Recommendation

Use Option A. It solves the request-size problem without discarding the backend subtitle pipeline that already exists.

Architecture

Frontend

Add a small Ark upload helper that:

  1. Accepts the selected File
  2. Sends FormData to https://ark.cn-beijing.volces.com/api/v3/files
  3. Includes:
    • purpose=user_data
    • file=@<video>
    • preprocess_configs[video][fps]=1
  4. Reads the response JSON and returns the Ark file_id

generateSubtitlePipeline(...) will gain an optional options object. When the provider is doubao, it will:

  1. Upload the file to Ark
  2. Call this app's /api/generate-subtitles with fileId, provider, targetLanguage, and optional trimRange

For gemini, it will keep the current multipart upload path unchanged.

Backend

The /api/generate-subtitles endpoint will support two request shapes:

  1. Existing multipart upload with video
  2. New JSON or urlencoded body with fileId

The subtitle request parser will be extended to accept optional fileId.

The video subtitle generation pipeline will accept either:

  1. videoPath
  2. fileId

For Doubao:

  • If fileId is present, send:
    • type: "input_video"
    • file_id: "<ark-file-id>"
    • fps: 1
  • If fileId is absent, preserve the current base64 fallback path

For Gemini:

  • Continue requiring a local uploaded file path
  • Return a clear error if Gemini is requested without video

Data Flow

Doubao Path

  1. User selects video in the browser
  2. EditorScreen triggers subtitle generation
  3. Frontend uploads the File to Ark Files API
  4. Frontend receives file_id
  5. Frontend posts fileId to /api/generate-subtitles
  6. Backend resolves Doubao provider config
  7. Backend calls Ark Responses API with file_id
  8. Backend parses and normalizes subtitle JSON
  9. Frontend renders normalized subtitles

Gemini Path

  1. User selects video in the browser
  2. Frontend posts multipart form data with video
  3. Backend sends inline video bytes to Gemini as today

Error Handling

Frontend Upload Errors

If Ark Files API fails, the frontend should surface a direct upload error and avoid calling this app's backend. The user should see the upstream message when possible.

Backend Request Validation

The backend should reject requests when:

  • Neither video nor fileId is provided
  • targetLanguage is missing
  • gemini is requested with fileId only

Provider-Specific Behavior

  • doubao + fileId uses the new Ark file reference path
  • doubao + video remains supported as fallback
  • gemini + video remains unchanged
  • gemini + fileId returns a clear validation error

Testing Strategy

Frontend

  • Unit test Ark file upload helper request shape
  • Unit test generateSubtitlePipeline uses fileId for Doubao and skips multipart video upload to this app's backend
  • Unit test generateSubtitlePipeline keeps multipart upload for Gemini
  • UI test EditorScreen still passes the selected provider through subtitle generation

Backend

  • Unit test subtitle request parsing with fileId
  • Unit test Doubao video generation uses file_id when present
  • Unit test base64 fallback remains intact
  • Unit test Gemini path rejects fileId-only requests

Rollout Notes

Keep the base64 Doubao fallback during this change. That makes the new flow additive instead of a risky cutover and keeps local tests simpler while the frontend upload path settles.