video_translate/docs/plans/2026-03-17-export-preview-parity-design.md
2026-03-18 11:42:00 +08:00

3.5 KiB

Export Preview Parity Design

Date: 2026-03-17

Goal: Make exported videos match the editor preview for audio mixing, subtitle timing, and visible subtitle styling.

Current State

The editor preview and the export pipeline currently render the same edit session through different implementations:

  1. The preview in src/components/EditorScreen.tsx overlays subtitle text with React and plays audio using the browser media elements plus per-subtitle Audio instances.
  2. The export in server.ts rebuilds subtitles as SRT, mixes audio with FFmpeg, and trims the final output after subtitle timing and TTS delays have already been computed.

This creates three deterministic mismatches:

  1. Export mixes original audio even when the preview has muted it because instrumental BGM is present.
  2. Export uses relative subtitle times from the trimmed editor session but trims the final video afterward, shifting or cutting subtitle/TTS timing.
  3. Export ignores textStyles, so the rendered subtitle look differs from the preview.

Chosen Approach

Adopt preview-first export semantics:

  1. Treat the editor state as the source of truth.
  2. Serialize the preview-visible subtitle data, text styles, and audio volume data explicitly.
  3. Convert preview-relative subtitle timing into export timeline timing before FFmpeg rendering.
  4. Generate styled subtitle overlays in the backend instead of relying on FFmpeg defaults.

Architecture

Frontend

The editor passes a richer export payload:

  1. Subtitle text
  2. Subtitle timing
  3. Subtitle audio volume
  4. Global text style settings
  5. Trim range
  6. Instrumental BGM base64 when present

The preview itself stays unchanged and remains the reference behavior.

Backend Export Layer

The export route should move the parity-sensitive logic into pure helpers:

  1. Build an export subtitle timeline that shifts relative editor timings back onto the full-video timeline when trimming is enabled.
  2. Build an audio mix plan that mirrors preview rules:
    • Use instrumental BGM at preview volume when present.
    • Exclude original source audio when instrumental BGM is present.
    • Otherwise keep original source audio at preview volume.
    • Apply each subtitle TTS clip at its configured volume.
  3. Generate ASS subtitle content so font, color, alignment, bold, italic, and underline can be rendered intentionally.

Data Flow

  1. EditorScreen passes textStyles into ExportModal.
  2. ExportModal builds a structured export payload instead of manually shaping subtitle fields inline.
  3. server.ts parses textStyles, normalizes subtitle timing for export, builds ASS subtitle content, and applies the preview-equivalent audio plan.
  4. FFmpeg burns styled subtitles and mixes the planned audio sources.

Testing Strategy

Add regression coverage around pure helpers instead of FFmpeg end-to-end tests:

  1. Frontend payload builder includes style and volume fields.
  2. Export timeline normalization shifts subtitle timing correctly for trimmed clips.
  3. Audio mix planning excludes original audio when BGM is present and keeps it at preview volume when BGM is absent.
  4. ASS subtitle generation reflects the selected style settings.

Risks

  1. ASS subtitle rendering may still not be pixel-perfect relative to browser CSS.
  2. Existing exports without style payload should remain backward compatible by falling back to safe defaults.
  3. FFmpeg filter graph assembly becomes slightly more complex, so helper-level tests are required before touching route logic.