# Doubao Frontend File ID Upload Design **Goal:** Let the browser upload videos to Volcengine Ark Files API, then send the returned `file_id` to this app's backend so Doubao subtitle generation can use `Responses API` with `file_id` instead of inline base64 video payloads. ## Context The current subtitle flow uploads the full video to this app's backend, then the backend reads the file and sends a `data:video/mp4;base64,...` payload to Doubao. That works for smaller files, but it inherits request body size limits and repeats the full video upload on every subtitle generation request. The user wants a staged flow: 1. Frontend uploads the selected video directly to Ark Files API. 2. Frontend receives a `file_id`. 3. Frontend calls this app's `/api/generate-subtitles` endpoint with that `file_id`. 4. Backend keeps ownership of the Doubao `Responses API` request, logging, normalization, and subtitle result shaping. ## Approaches Considered ### Option A: Frontend uploads to Files API, backend uses `file_id` for Doubao This keeps the current app architecture mostly intact. Only the upload stage moves to the browser. The backend still handles provider selection, subtitle parsing, error mapping, and normalized response shaping. **Pros** - Smallest architectural change - Keeps existing backend logging and response normalization - Preserves the existing `/api/generate-subtitles` contract with a backward-compatible extension - Allows a gradual rollout because base64 upload can remain as fallback **Cons** - Frontend gains Ark-specific upload logic - The browser now coordinates two network calls for Doubao ### Option B: Frontend uploads to Files API and also calls Doubao `Responses API` This removes backend involvement for Doubao subtitle generation, but it pushes subtitle parsing and normalization into the browser. **Pros** - Shorter network path for Doubao **Cons** - Large frontend refactor - Duplicates provider logic across frontend and backend - Loses centralized logging and error handling - Makes Gemini and Doubao flows diverge more sharply ### Recommendation Use **Option A**. It solves the request-size problem without discarding the backend subtitle pipeline that already exists. ## Architecture ### Frontend Add a small Ark upload helper that: 1. Accepts the selected `File` 2. Sends `FormData` to `https://ark.cn-beijing.volces.com/api/v3/files` 3. Includes: - `purpose=user_data` - `file=@