Skip to content

Web-based voice editor — extract audio from YouTube, transcribe with word-level timestamps, edit with synced waveform + text UI, and remove background music with AI

License

Notifications You must be signed in to change notification settings

chadingTV/voiceeditor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VoiceEditor

한국어

A web-based voice extraction and editing tool. Collect audio from YouTube or system output, generate word-level transcripts with STT, edit with a synchronized waveform + text UI, cut and rearrange segments, and remove background music with AI — all in one place.

Features

  • YouTube Audio Extraction — Download audio from any YouTube URL (yt-dlp)
  • System Audio Recording — Record system output via BlackHole (macOS)
  • File Upload — Drag-and-drop local audio files
  • Speech-to-Text — Word-level timestamps via faster-whisper
  • Waveform + Text Sync — wavesurfer.js waveform with per-word highlighting and click-to-seek
  • Inline Text Editing — Edit transcript text per segment
  • Cut & Rearrange — Select regions on the waveform, cut into segments, drag to reorder
  • Background Removal — Vocal/music separation with Demucs AI, switch between stems
  • Export — Download as WAV/MP3 audio or TXT/SRT transcript

Tech Stack

Layer Technology
Backend Python + FastAPI + SQLite (SQLAlchemy async)
Frontend React + TypeScript + Vite + TailwindCSS v4
State Zustand
Audio yt-dlp, sounddevice, ffmpeg
STT faster-whisper (word_timestamps)
Separation Demucs (htdemucs) + torchcodec
Waveform wavesurfer.js + RegionsPlugin
Drag & Drop @dnd-kit/core + @dnd-kit/sortable

Prerequisites

  • Python 3.11–3.13 (recommended) or Python 3.14+ (requires separate Python 3.11–3.13 for Demucs)
  • Node.js 18+
  • ffmpeg (brew install ffmpeg / sudo apt install ffmpeg)
  • BlackHole (macOS system audio recording — download)

Python version note: With Python 3.11–3.13, all dependencies (including Demucs) are installed in a single venv. Python 3.14+ is incompatible with Demucs, so the setup script automatically creates a separate venv.

Quick Start

Automated Setup

git clone https://github.com/chadingTV/voiceeditor.git
cd voiceeditor
./scripts/setup.sh

The setup script automatically:

  1. Checks prerequisites (python3, node, npm, ffmpeg)
  2. Detects Python version → single venv or separate Demucs venv
  3. Creates backend Python venv and installs dependencies
  4. (Python 3.14+ only) Creates Demucs venv with compatible Python
  5. Installs SwitchAudioSource (macOS, for system audio recording)
  6. Installs frontend npm packages

Run

# Start both backend and frontend
./scripts/dev.sh

Open http://localhost:5173 in your browser.

Manual Setup

Click here for manual installation steps

Backend

cd backend
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# If Python 3.11-3.13, also install demucs:
pip install -r requirements-demucs.txt

Demucs Separate Env (Python 3.14+ only)

cd backend
python3.12 -m venv .venv-demucs  # or python3.11, python3.13
source .venv-demucs/bin/activate
pip install -r requirements-demucs.txt
deactivate

Frontend

cd frontend
npm install

Run Individually

# Backend
cd backend && source .venv/bin/activate && uvicorn main:app --reload --port 8000

# Frontend
cd frontend && npm run dev

Project Structure

voiceeditor/
├── backend/
│   ├── main.py                  # FastAPI app
│   ├── config.py                # Configuration
│   ├── requirements.txt         # Main backend dependencies
│   ├── requirements-demucs.txt  # Demucs-specific dependencies
│   ├── routers/                 # API routers
│   │   ├── projects.py          # Project CRUD
│   │   ├── audio.py             # Audio import (YouTube/upload/recording)
│   │   ├── transcription.py     # STT + text editing + TXT/SRT download
│   │   ├── separation.py        # Background removal (Demucs subprocess)
│   │   └── editor.py            # Segment editing & export
│   ├── services/                # Business logic
│   ├── models/                  # DB models & schemas
│   └── tasks/                   # Background task manager
├── frontend/
│   └── src/
│       ├── api/                 # API client modules
│       ├── stores/              # Zustand stores
│       ├── components/
│       │   ├── layout/          # AppShell, Header, Sidebar
│       │   ├── import/          # YouTube, upload, recording UI
│       │   └── editor/          # Waveform editor, transcript panel, segment timeline
│       ├── hooks/               # Custom hooks
│       └── types/               # TypeScript types
└── scripts/
    ├── setup.sh                 # Automated setup script
    └── dev.sh                   # Dev server launcher

Usage

  1. Create a project — Click "New Project" in the sidebar
  2. Import audio — Paste a YouTube URL, upload a file, or record system audio
  3. Generate transcript — Click "Generate STT" in the editor
  4. Review & edit text — Click the pencil icon on any segment to edit inline
  5. Cut segments — Drag-select a region on the waveform → "Cut Selection"
  6. Reorder — Drag segments in the timeline to rearrange
  7. Remove background — Click "Remove Background" → select Vocals/No Vocals stem
  8. Export — Download audio (WAV/MP3) or transcript (TXT/SRT)

Architecture Notes

Demucs Execution

Demucs always runs as a subprocess. The Python executable is auto-detected based on the environment:

System Python Demucs Strategy
3.11–3.13 Runs directly from the main venv (single venv)
3.14+ Runs from .venv-demucs with compatible Python (dual venv)
Backend (separation.py)
    │
    ├── _find_demucs_python()  ← auto-detect
    │       │
    │       ├── .venv-demucs exists? → .venv-demucs/bin/python3
    │       └── otherwise → try current python's demucs
    │
    └── subprocess.run([python, "-m", "demucs", ...])

Changelog

Bug Fixes

  • Stem selector reset — Switching between Original/Vocals/No Vocals no longer resets to Original
  • Export wrong audio — Export now correctly uses the current audio file, not the first one in the project
  • Export ignoring reorder — Exported audio now respects the drag-and-drop segment order
  • Export ignoring active stem — Exporting in Vocals mode now exports vocals only, not the original
  • pydub crash on Python 3.14 — Replaced pydub (broken audioop module) with direct ffmpeg subprocess
  • Demucs torchcodec missing — Added torchcodec to demucs dependencies for audio saving
  • System recording silence — Auto-switch to multi-output device when recording starts
  • Audio output stuck on multi-output — Fallback to built-in speaker when previous output device is disconnected
  • Audio output not restored on crash — Added atexit handler to restore output on server shutdown
  • DndContext hijacking clicks — Added pointer distance threshold so buttons work alongside drag-and-drop
  • Transcript edit not displaying — Edited text now correctly shown instead of original words
  • Download encoding error — Fixed Korean filename encoding in Content-Disposition header (RFC 5987)
  • STT infinite loading — Added error handling for failed background tasks

Features Added

  • Audio file rename (inline edit) and delete
  • Transcript download in TXT and SRT formats
  • Audio file download
  • Automated setup script with Python version detection
  • Cross-platform Demucs path auto-detection

License

This project is licensed under the MIT License.

If you redistribute or use this project in derivative works, please include the following attribution:

Original project: VoiceEditor by chadingTV https://github.com/chadingTV/voiceeditor

About

Web-based voice editor — extract audio from YouTube, transcribe with word-level timestamps, edit with synced waveform + text UI, and remove background music with AI

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published