⚠️ The main branch is bleeding edge: Expect frequent updates and many breaking changes after every commit
This project provides a comprehensive platform for training RVC models and generating AI voice covers. Use the app to download the required files before using or manually download them here: https://huggingface.co/datasets/SayanoAI/RVC-Studio/tree/main
- Youtube music downloader: download any music video from Youtube as an mp3 file with just one click.
- 1-click AI song covers: easily create AI song covers using RVC.
- RVC Model fine-tuning: fine-tune an RVC model to mimic any voice you want using your own data.
- 1-click TTS using RVC model: convert any text to speech using the fine-tuned VC model with just one click.
- Built-in tensorboard: You can monitor the training progress and performance of your VC model using a built-in tensorboard dashboard.
- LLM integration: chat with your RVC model in real time using popular LLMs.
- Auto-Playlist: let your RVC model sing songs from your favourite playlist.
- Demucs: Meta's vocals and instrumental music source separation.
- Audio-postprocessing: You can enhance the quality of your generated songs by adding reverbs, echos, etc.
- TTS using cloud API: use a cloud-based text-to-speech service to generate high-quality and natural-sounding speech from any text.
- Real-time VC interface: convert your voice using your favourite RVC model.
- Python 3.6 or higher (developed and tested on v3.8.17)
- Pip
- Virtualenv or conda package manager
- Clone this repository or download the zip file and extract it.
- Double-click "conda-installer.bat" to install the latest version of conda package manager
- Double-click "conda-start.bat" (if you skipped step 2.)
- Clone this repository or download the zip file.
- Navigate to the project directory and create a virtual environment with the command
virtualenv venv
. - Activate the virtual environment with the command
source venv/bin/activate
on Linux/Mac orvenv\Scripts\activate
on Windows. Or useconda create -n RVC-Studio & conda activate RVC-Studio
if you're using conda package manager. - Install the required packages with the command
pip install -r requirements.txt
. - Run the streamlit app with the command
streamlit run Home.py
.
Or run it in Google Colab
- Download all the required models on the webui page or here: https://huggingface.co/datasets/SayanoAI/RVC-Studio/tree/main
- Put your favourite songs in the ./songs folder
- Navigate to "RVC Server" page and start the server
- Navigate to "Inference" page and press "Refresh Data" button
- Select a song (only wav/flac/ogg/mp3 are supported for now)
- Select a voice model (put your RVC v2 models in ./models/RVC/ and index file in ./models/RVC/.index/)
- Choose a vocal extraction model (preprocessing model is optional)
- Click "Save Options" and "1-Click VC" to get started
Chat functionality has been migrated to RVC-Chat.
Feel free to use larger versions of these models if your computer can handle it. (you will have to build your own config)
Run docker compose up --build
in the main project folder.
Known issue: Tensorboard doesn't work inside a docker container. Feel free to submit a PR if you know a solution. fixed in commit 8b720936b4dab347cba0e4a791330fb533bfdf1d
- Trouble with ffmpeg/espeak? Read this
This project is for educational and research purposes only. The generated voice overs are not intended to infringe on any copyrights or trademarks of the original songs or text. The project does not endorse or promote any illegal or unethical use of the generative AI technology. The project is not responsible for any damages or liabilities arising from the use or misuse of the generated voice overs.
This project uses code and AI models from the following repositories:
- Karafan by Captain-FLAM.
- Retrieval-based Voice Conversion WebUI by RVC-Project.
- Ultimate Vocal Remover GUI by Anjok07.
- Streamlit by streamlit.
- Tacotron 2 - PyTorch implementation with faster-than-realtime inference by NVIDIA.
- Bark: A Speech Synthesis Toolkit for Bengali Language by suno-ai.
- SpeechT5: A Self-Supervised Pre-training Model for Speech Recognition and Generation by microsoft.
- VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech by jaywalnut310 et al.
- Applio-RVC-Fork by IAHispano
We thank all the authors and contributors of these repositories for their amazing work and for making their code and models publicly available.