All checks were successful
Gitea Actions Demo / Explore-Gitea-Actions (push) Successful in 27s
157 lines
5.4 KiB
Markdown
157 lines
5.4 KiB
Markdown
# Doubao Frontend File ID Upload Design
|
|
|
|
**Goal:** Let the browser upload videos to Volcengine Ark Files API, then send the returned `file_id` to this app's backend so Doubao subtitle generation can use `Responses API` with `file_id` instead of inline base64 video payloads.
|
|
|
|
## Context
|
|
|
|
The current subtitle flow uploads the full video to this app's backend, then the backend reads the file and sends a `data:video/mp4;base64,...` payload to Doubao. That works for smaller files, but it inherits request body size limits and repeats the full video upload on every subtitle generation request.
|
|
|
|
The user wants a staged flow:
|
|
|
|
1. Frontend uploads the selected video directly to Ark Files API.
|
|
2. Frontend receives a `file_id`.
|
|
3. Frontend calls this app's `/api/generate-subtitles` endpoint with that `file_id`.
|
|
4. Backend keeps ownership of the Doubao `Responses API` request, logging, normalization, and subtitle result shaping.
|
|
|
|
## Approaches Considered
|
|
|
|
### Option A: Frontend uploads to Files API, backend uses `file_id` for Doubao
|
|
|
|
This keeps the current app architecture mostly intact. Only the upload stage moves to the browser. The backend still handles provider selection, subtitle parsing, error mapping, and normalized response shaping.
|
|
|
|
**Pros**
|
|
- Smallest architectural change
|
|
- Keeps existing backend logging and response normalization
|
|
- Preserves the existing `/api/generate-subtitles` contract with a backward-compatible extension
|
|
- Allows a gradual rollout because base64 upload can remain as fallback
|
|
|
|
**Cons**
|
|
- Frontend gains Ark-specific upload logic
|
|
- The browser now coordinates two network calls for Doubao
|
|
|
|
### Option B: Frontend uploads to Files API and also calls Doubao `Responses API`
|
|
|
|
This removes backend involvement for Doubao subtitle generation, but it pushes subtitle parsing and normalization into the browser.
|
|
|
|
**Pros**
|
|
- Shorter network path for Doubao
|
|
|
|
**Cons**
|
|
- Large frontend refactor
|
|
- Duplicates provider logic across frontend and backend
|
|
- Loses centralized logging and error handling
|
|
- Makes Gemini and Doubao flows diverge more sharply
|
|
|
|
### Recommendation
|
|
|
|
Use **Option A**. It solves the request-size problem without discarding the backend subtitle pipeline that already exists.
|
|
|
|
## Architecture
|
|
|
|
### Frontend
|
|
|
|
Add a small Ark upload helper that:
|
|
|
|
1. Accepts the selected `File`
|
|
2. Sends `FormData` to `https://ark.cn-beijing.volces.com/api/v3/files`
|
|
3. Includes:
|
|
- `purpose=user_data`
|
|
- `file=@<video>`
|
|
- `preprocess_configs[video][fps]=1`
|
|
4. Reads the response JSON and returns the Ark `file_id`
|
|
|
|
`generateSubtitlePipeline(...)` will gain an optional `options` object. When the provider is `doubao`, it will:
|
|
|
|
1. Upload the file to Ark
|
|
2. Call this app's `/api/generate-subtitles` with `fileId`, `provider`, `targetLanguage`, and optional `trimRange`
|
|
|
|
For `gemini`, it will keep the current multipart upload path unchanged.
|
|
|
|
### Backend
|
|
|
|
The `/api/generate-subtitles` endpoint will support two request shapes:
|
|
|
|
1. Existing multipart upload with `video`
|
|
2. New JSON or urlencoded body with `fileId`
|
|
|
|
The subtitle request parser will be extended to accept optional `fileId`.
|
|
|
|
The video subtitle generation pipeline will accept either:
|
|
|
|
1. `videoPath`
|
|
2. `fileId`
|
|
|
|
For Doubao:
|
|
|
|
- If `fileId` is present, send:
|
|
- `type: "input_video"`
|
|
- `file_id: "<ark-file-id>"`
|
|
- `fps: 1`
|
|
- If `fileId` is absent, preserve the current base64 fallback path
|
|
|
|
For Gemini:
|
|
|
|
- Continue requiring a local uploaded file path
|
|
- Return a clear error if Gemini is requested without `video`
|
|
|
|
## Data Flow
|
|
|
|
### Doubao Path
|
|
|
|
1. User selects video in the browser
|
|
2. `EditorScreen` triggers subtitle generation
|
|
3. Frontend uploads the `File` to Ark Files API
|
|
4. Frontend receives `file_id`
|
|
5. Frontend posts `fileId` to `/api/generate-subtitles`
|
|
6. Backend resolves Doubao provider config
|
|
7. Backend calls Ark `Responses API` with `file_id`
|
|
8. Backend parses and normalizes subtitle JSON
|
|
9. Frontend renders normalized subtitles
|
|
|
|
### Gemini Path
|
|
|
|
1. User selects video in the browser
|
|
2. Frontend posts multipart form data with `video`
|
|
3. Backend sends inline video bytes to Gemini as today
|
|
|
|
## Error Handling
|
|
|
|
### Frontend Upload Errors
|
|
|
|
If Ark Files API fails, the frontend should surface a direct upload error and avoid calling this app's backend. The user should see the upstream message when possible.
|
|
|
|
### Backend Request Validation
|
|
|
|
The backend should reject requests when:
|
|
|
|
- Neither `video` nor `fileId` is provided
|
|
- `targetLanguage` is missing
|
|
- `gemini` is requested with `fileId` only
|
|
|
|
### Provider-Specific Behavior
|
|
|
|
- `doubao + fileId` uses the new Ark file reference path
|
|
- `doubao + video` remains supported as fallback
|
|
- `gemini + video` remains unchanged
|
|
- `gemini + fileId` returns a clear validation error
|
|
|
|
## Testing Strategy
|
|
|
|
### Frontend
|
|
|
|
- Unit test Ark file upload helper request shape
|
|
- Unit test `generateSubtitlePipeline` uses `fileId` for Doubao and skips multipart video upload to this app's backend
|
|
- Unit test `generateSubtitlePipeline` keeps multipart upload for Gemini
|
|
- UI test `EditorScreen` still passes the selected provider through subtitle generation
|
|
|
|
### Backend
|
|
|
|
- Unit test subtitle request parsing with `fileId`
|
|
- Unit test Doubao video generation uses `file_id` when present
|
|
- Unit test base64 fallback remains intact
|
|
- Unit test Gemini path rejects `fileId`-only requests
|
|
|
|
## Rollout Notes
|
|
|
|
Keep the base64 Doubao fallback during this change. That makes the new flow additive instead of a risky cutover and keeps local tests simpler while the frontend upload path settles.
|