video_translate/docs/plans/2026-03-19-doubao-file-id-frontend-design.md
Song367 a0c1dc6ad5
All checks were successful
Gitea Actions Demo / Explore-Gitea-Actions (push) Successful in 27s
文件上传
2026-03-19 11:17:10 +08:00

157 lines
5.4 KiB
Markdown

# Doubao Frontend File ID Upload Design
**Goal:** Let the browser upload videos to Volcengine Ark Files API, then send the returned `file_id` to this app's backend so Doubao subtitle generation can use `Responses API` with `file_id` instead of inline base64 video payloads.
## Context
The current subtitle flow uploads the full video to this app's backend, then the backend reads the file and sends a `data:video/mp4;base64,...` payload to Doubao. That works for smaller files, but it inherits request body size limits and repeats the full video upload on every subtitle generation request.
The user wants a staged flow:
1. Frontend uploads the selected video directly to Ark Files API.
2. Frontend receives a `file_id`.
3. Frontend calls this app's `/api/generate-subtitles` endpoint with that `file_id`.
4. Backend keeps ownership of the Doubao `Responses API` request, logging, normalization, and subtitle result shaping.
## Approaches Considered
### Option A: Frontend uploads to Files API, backend uses `file_id` for Doubao
This keeps the current app architecture mostly intact. Only the upload stage moves to the browser. The backend still handles provider selection, subtitle parsing, error mapping, and normalized response shaping.
**Pros**
- Smallest architectural change
- Keeps existing backend logging and response normalization
- Preserves the existing `/api/generate-subtitles` contract with a backward-compatible extension
- Allows a gradual rollout because base64 upload can remain as fallback
**Cons**
- Frontend gains Ark-specific upload logic
- The browser now coordinates two network calls for Doubao
### Option B: Frontend uploads to Files API and also calls Doubao `Responses API`
This removes backend involvement for Doubao subtitle generation, but it pushes subtitle parsing and normalization into the browser.
**Pros**
- Shorter network path for Doubao
**Cons**
- Large frontend refactor
- Duplicates provider logic across frontend and backend
- Loses centralized logging and error handling
- Makes Gemini and Doubao flows diverge more sharply
### Recommendation
Use **Option A**. It solves the request-size problem without discarding the backend subtitle pipeline that already exists.
## Architecture
### Frontend
Add a small Ark upload helper that:
1. Accepts the selected `File`
2. Sends `FormData` to `https://ark.cn-beijing.volces.com/api/v3/files`
3. Includes:
- `purpose=user_data`
- `file=@<video>`
- `preprocess_configs[video][fps]=1`
4. Reads the response JSON and returns the Ark `file_id`
`generateSubtitlePipeline(...)` will gain an optional `options` object. When the provider is `doubao`, it will:
1. Upload the file to Ark
2. Call this app's `/api/generate-subtitles` with `fileId`, `provider`, `targetLanguage`, and optional `trimRange`
For `gemini`, it will keep the current multipart upload path unchanged.
### Backend
The `/api/generate-subtitles` endpoint will support two request shapes:
1. Existing multipart upload with `video`
2. New JSON or urlencoded body with `fileId`
The subtitle request parser will be extended to accept optional `fileId`.
The video subtitle generation pipeline will accept either:
1. `videoPath`
2. `fileId`
For Doubao:
- If `fileId` is present, send:
- `type: "input_video"`
- `file_id: "<ark-file-id>"`
- `fps: 1`
- If `fileId` is absent, preserve the current base64 fallback path
For Gemini:
- Continue requiring a local uploaded file path
- Return a clear error if Gemini is requested without `video`
## Data Flow
### Doubao Path
1. User selects video in the browser
2. `EditorScreen` triggers subtitle generation
3. Frontend uploads the `File` to Ark Files API
4. Frontend receives `file_id`
5. Frontend posts `fileId` to `/api/generate-subtitles`
6. Backend resolves Doubao provider config
7. Backend calls Ark `Responses API` with `file_id`
8. Backend parses and normalizes subtitle JSON
9. Frontend renders normalized subtitles
### Gemini Path
1. User selects video in the browser
2. Frontend posts multipart form data with `video`
3. Backend sends inline video bytes to Gemini as today
## Error Handling
### Frontend Upload Errors
If Ark Files API fails, the frontend should surface a direct upload error and avoid calling this app's backend. The user should see the upstream message when possible.
### Backend Request Validation
The backend should reject requests when:
- Neither `video` nor `fileId` is provided
- `targetLanguage` is missing
- `gemini` is requested with `fileId` only
### Provider-Specific Behavior
- `doubao + fileId` uses the new Ark file reference path
- `doubao + video` remains supported as fallback
- `gemini + video` remains unchanged
- `gemini + fileId` returns a clear validation error
## Testing Strategy
### Frontend
- Unit test Ark file upload helper request shape
- Unit test `generateSubtitlePipeline` uses `fileId` for Doubao and skips multipart video upload to this app's backend
- Unit test `generateSubtitlePipeline` keeps multipart upload for Gemini
- UI test `EditorScreen` still passes the selected provider through subtitle generation
### Backend
- Unit test subtitle request parsing with `fileId`
- Unit test Doubao video generation uses `file_id` when present
- Unit test base64 fallback remains intact
- Unit test Gemini path rejects `fileId`-only requests
## Rollout Notes
Keep the base64 Doubao fallback during this change. That makes the new flow additive instead of a risky cutover and keeps local tests simpler while the frontend upload path settles.