5.4 KiB
Doubao Frontend File ID Upload Design
Goal: Let the browser upload videos to Volcengine Ark Files API, then send the returned file_id to this app's backend so Doubao subtitle generation can use Responses API with file_id instead of inline base64 video payloads.
Context
The current subtitle flow uploads the full video to this app's backend, then the backend reads the file and sends a data:video/mp4;base64,... payload to Doubao. That works for smaller files, but it inherits request body size limits and repeats the full video upload on every subtitle generation request.
The user wants a staged flow:
- Frontend uploads the selected video directly to Ark Files API.
- Frontend receives a
file_id. - Frontend calls this app's
/api/generate-subtitlesendpoint with thatfile_id. - Backend keeps ownership of the Doubao
Responses APIrequest, logging, normalization, and subtitle result shaping.
Approaches Considered
Option A: Frontend uploads to Files API, backend uses file_id for Doubao
This keeps the current app architecture mostly intact. Only the upload stage moves to the browser. The backend still handles provider selection, subtitle parsing, error mapping, and normalized response shaping.
Pros
- Smallest architectural change
- Keeps existing backend logging and response normalization
- Preserves the existing
/api/generate-subtitlescontract with a backward-compatible extension - Allows a gradual rollout because base64 upload can remain as fallback
Cons
- Frontend gains Ark-specific upload logic
- The browser now coordinates two network calls for Doubao
Option B: Frontend uploads to Files API and also calls Doubao Responses API
This removes backend involvement for Doubao subtitle generation, but it pushes subtitle parsing and normalization into the browser.
Pros
- Shorter network path for Doubao
Cons
- Large frontend refactor
- Duplicates provider logic across frontend and backend
- Loses centralized logging and error handling
- Makes Gemini and Doubao flows diverge more sharply
Recommendation
Use Option A. It solves the request-size problem without discarding the backend subtitle pipeline that already exists.
Architecture
Frontend
Add a small Ark upload helper that:
- Accepts the selected
File - Sends
FormDatatohttps://ark.cn-beijing.volces.com/api/v3/files - Includes:
purpose=user_datafile=@<video>preprocess_configs[video][fps]=1
- Reads the response JSON and returns the Ark
file_id
generateSubtitlePipeline(...) will gain an optional options object. When the provider is doubao, it will:
- Upload the file to Ark
- Call this app's
/api/generate-subtitleswithfileId,provider,targetLanguage, and optionaltrimRange
For gemini, it will keep the current multipart upload path unchanged.
Backend
The /api/generate-subtitles endpoint will support two request shapes:
- Existing multipart upload with
video - New JSON or urlencoded body with
fileId
The subtitle request parser will be extended to accept optional fileId.
The video subtitle generation pipeline will accept either:
videoPathfileId
For Doubao:
- If
fileIdis present, send:type: "input_video"file_id: "<ark-file-id>"fps: 1
- If
fileIdis absent, preserve the current base64 fallback path
For Gemini:
- Continue requiring a local uploaded file path
- Return a clear error if Gemini is requested without
video
Data Flow
Doubao Path
- User selects video in the browser
EditorScreentriggers subtitle generation- Frontend uploads the
Fileto Ark Files API - Frontend receives
file_id - Frontend posts
fileIdto/api/generate-subtitles - Backend resolves Doubao provider config
- Backend calls Ark
Responses APIwithfile_id - Backend parses and normalizes subtitle JSON
- Frontend renders normalized subtitles
Gemini Path
- User selects video in the browser
- Frontend posts multipart form data with
video - Backend sends inline video bytes to Gemini as today
Error Handling
Frontend Upload Errors
If Ark Files API fails, the frontend should surface a direct upload error and avoid calling this app's backend. The user should see the upstream message when possible.
Backend Request Validation
The backend should reject requests when:
- Neither
videonorfileIdis provided targetLanguageis missinggeminiis requested withfileIdonly
Provider-Specific Behavior
doubao + fileIduses the new Ark file reference pathdoubao + videoremains supported as fallbackgemini + videoremains unchangedgemini + fileIdreturns a clear validation error
Testing Strategy
Frontend
- Unit test Ark file upload helper request shape
- Unit test
generateSubtitlePipelineusesfileIdfor Doubao and skips multipart video upload to this app's backend - Unit test
generateSubtitlePipelinekeeps multipart upload for Gemini - UI test
EditorScreenstill passes the selected provider through subtitle generation
Backend
- Unit test subtitle request parsing with
fileId - Unit test Doubao video generation uses
file_idwhen present - Unit test base64 fallback remains intact
- Unit test Gemini path rejects
fileId-only requests
Rollout Notes
Keep the base64 Doubao fallback during this change. That makes the new flow additive instead of a risky cutover and keeps local tests simpler while the frontend upload path settles.