diff --git a/.env b/.env index 3ea8e64..8aff301 100644 --- a/.env +++ b/.env @@ -2,4 +2,6 @@ GEMINI_API_KEY="AIzaSyAex0MkGj_X-h3L38334xVdZsFzOcU9cC0" ARK_API_KEY="e96194a9-8eda-4a90-a211-6db288045bdc" MINIMAX_API_KEY="eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJHcm91cE5hbWUiOiLkuIrmtbfpopzpgJTnp5HmioDmnInpmZDlhazlj7giLCJVc2VyTmFtZSI6IuadqOmqpSIsIkFjY291bnQiOiIiLCJTdWJqZWN0SUQiOiIxNzI4NzEyMzI0OTc5NjI2ODM5IiwiUGhvbmUiOiIxMzM4MTU1OTYxOCIsIkdyb3VwSUQiOiIxNzI4NzEyMzI0OTcxMjM4MjMxIiwiUGFnZU5hbWUiOiIiLCJNYWlsIjoiIiwiQ3JlYXRlVGltZSI6IjIwMjUtMDYtMDYgMTU6MDU6NTUiLCJUb2tlblR5cGUiOjEsImlzcyI6Im1pbmltYXgifQ.aw1AUJnBYxXerJ4qNUaXM3DqPTd94WSVHWRiIpnjImhuCia3Ta1AyANTQTx__2CF5eByHOaHJFHhBCg6KgHUEaR6TiWFn0fWwXaU7XgnHwbvD4pNAmF_uYxMKbi-a6IyIGNyFdEMy22V5JEqfY4okAco5U96cnSOQZH7lyIBpvOsesjZU6L9q6Tf2jvlcnO9QG8GPg2DVpeL8Q3zLuYWezN4Wk6N-ISwQmZUwBYL3BhYamsFqCdSEyMd_uYQ_aQJa5tmlQqpimtALiutFshPUXB6VsvXEO6q-lCZ6Tg8QWwlFHkmEtUMQw4pWoX25d7Us06VFUhvV6pOzvM7yqCaWw" VITE_BASE_URL=/video_translate/ -VITE_API_BASE_PATH=/video_translate/api \ No newline at end of file +VITE_API_BASE_PATH=/video_translate/api +DOUBAO_TIMEOUT_MS=900000 +VITE_ARK_API_KEY="e96194a9-8eda-4a90-a211-6db288045bdc" diff --git a/.env.example b/.env.example index 2db7bb0..0d11e06 100644 --- a/.env.example +++ b/.env.example @@ -4,6 +4,10 @@ GEMINI_API_KEY="MY_GEMINI_API_KEY" # ARK_API_KEY: Required when the editor LLM is set to Doubao. ARK_API_KEY="YOUR_ARK_API_KEY" +# VITE_ARK_API_KEY: Required only if the browser uploads videos directly to Ark Files API. +# This exposes the key to the frontend and should only be used in trusted environments. +# VITE_ARK_API_KEY="YOUR_ARK_API_KEY" + # DEFAULT_LLM_PROVIDER: Optional editor default. Supported values: doubao, gemini. # Defaults to doubao. DEFAULT_LLM_PROVIDER="doubao" @@ -12,11 +16,19 @@ DEFAULT_LLM_PROVIDER="doubao" # Defaults to doubao-seed-2-0-pro-260215. DOUBAO_MODEL="doubao-seed-2-0-pro-260215" +# DOUBAO_TIMEOUT_MS: Optional timeout for Doubao subtitle requests in milliseconds. +# Defaults to 600000 (10 minutes). +# DOUBAO_TIMEOUT_MS="600000" + # VITE_API_BASE_PATH: Optional frontend API base path. # Defaults to /api. # Set to /video_translate/api when the app is served under /video_translate. # VITE_API_BASE_PATH="/video_translate/api" +# VITE_ALLOWED_HOSTS: Optional comma-separated hostnames allowed by the Vite dev server. +# Useful when exposing the dev server through a tunnel such as cpolar. +# VITE_ALLOWED_HOSTS="ced4302.r20.vip.cpolar.cn" + # MINIMAX_API_KEY: Required for MiniMax TTS API calls. # Use a MiniMax API secret key that has TTS access enabled. MINIMAX_API_KEY="YOUR_MINIMAX_API_KEY" diff --git a/docs/plans/2026-03-18-ubuntu-start-script.md b/docs/plans/2026-03-18-ubuntu-start-script.md new file mode 100644 index 0000000..038e0db --- /dev/null +++ b/docs/plans/2026-03-18-ubuntu-start-script.md @@ -0,0 +1,62 @@ +# Ubuntu Start Script Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. + +**Goal:** Add Ubuntu development scripts that can start the app in the background and stop it later without requiring the caller to `cd` first. + +**Architecture:** Keep Bash entrypoints in the repository root. `start-dev.sh` resolves the project directory, creates a `run/` working area, launches `npm run dev` in a dedicated process group, and records the group leader PID and log path. `stop.sh` reads the recorded PID, stops the whole process group, and removes stale state. + +**Tech Stack:** Bash, npm + +--- + +### Task 1: Add Ubuntu start and stop scripts + +**Files:** +- Modify: `E:\Downloads\ai-video-dubbing-&-translation\start-dev.sh` +- Create: `E:\Downloads\ai-video-dubbing-&-translation\stop.sh` +- Create: `E:\Downloads\ai-video-dubbing-&-translation\docs\plans\2026-03-18-ubuntu-start-script.md` + +**Step 1: Define the verification target** + +Run: `bash -n ./start-dev.sh` +Expected: exit code 0 after the script is updated + +Run: `bash -n ./stop.sh` +Expected: exit code 0 after the script is added + +**Step 2: Write the minimal implementation** + +Update `start-dev.sh` so it: +- uses `#!/usr/bin/env bash` +- enables `set -euo pipefail` +- resolves the script directory +- changes into that directory +- creates `run/` +- starts `npm run dev` in the background as its own process group +- writes the process id to `run/dev.pid` +- writes logs to `run/dev.log` +- refuses to start a second copy if the PID is still alive + +Create `stop.sh` so it: +- resolves the script directory +- reads `run/dev.pid` +- sends `TERM` to the whole process group if it is running +- waits briefly and escalates to `KILL` only if needed +- removes stale `run/dev.pid` + +**Step 3: Run syntax verification** + +Run: `bash -n ./start-dev.sh` +Expected: exit code 0 with no syntax errors + +Run: `bash -n ./stop.sh` +Expected: exit code 0 with no syntax errors + +**Step 4: Run an execution smoke check** + +Run: `bash ./start-dev.sh` +Expected: npm starts the development server in the background and prints the PID/log location + +Run: `bash ./stop.sh` +Expected: the background dev process stops and the PID file is removed diff --git a/docs/plans/2026-03-19-doubao-file-id-frontend-design.md b/docs/plans/2026-03-19-doubao-file-id-frontend-design.md new file mode 100644 index 0000000..3a0f0d7 --- /dev/null +++ b/docs/plans/2026-03-19-doubao-file-id-frontend-design.md @@ -0,0 +1,156 @@ +# Doubao Frontend File ID Upload Design + +**Goal:** Let the browser upload videos to Volcengine Ark Files API, then send the returned `file_id` to this app's backend so Doubao subtitle generation can use `Responses API` with `file_id` instead of inline base64 video payloads. + +## Context + +The current subtitle flow uploads the full video to this app's backend, then the backend reads the file and sends a `data:video/mp4;base64,...` payload to Doubao. That works for smaller files, but it inherits request body size limits and repeats the full video upload on every subtitle generation request. + +The user wants a staged flow: + +1. Frontend uploads the selected video directly to Ark Files API. +2. Frontend receives a `file_id`. +3. Frontend calls this app's `/api/generate-subtitles` endpoint with that `file_id`. +4. Backend keeps ownership of the Doubao `Responses API` request, logging, normalization, and subtitle result shaping. + +## Approaches Considered + +### Option A: Frontend uploads to Files API, backend uses `file_id` for Doubao + +This keeps the current app architecture mostly intact. Only the upload stage moves to the browser. The backend still handles provider selection, subtitle parsing, error mapping, and normalized response shaping. + +**Pros** +- Smallest architectural change +- Keeps existing backend logging and response normalization +- Preserves the existing `/api/generate-subtitles` contract with a backward-compatible extension +- Allows a gradual rollout because base64 upload can remain as fallback + +**Cons** +- Frontend gains Ark-specific upload logic +- The browser now coordinates two network calls for Doubao + +### Option B: Frontend uploads to Files API and also calls Doubao `Responses API` + +This removes backend involvement for Doubao subtitle generation, but it pushes subtitle parsing and normalization into the browser. + +**Pros** +- Shorter network path for Doubao + +**Cons** +- Large frontend refactor +- Duplicates provider logic across frontend and backend +- Loses centralized logging and error handling +- Makes Gemini and Doubao flows diverge more sharply + +### Recommendation + +Use **Option A**. It solves the request-size problem without discarding the backend subtitle pipeline that already exists. + +## Architecture + +### Frontend + +Add a small Ark upload helper that: + +1. Accepts the selected `File` +2. Sends `FormData` to `https://ark.cn-beijing.volces.com/api/v3/files` +3. Includes: + - `purpose=user_data` + - `file=@