Skip to content

Refining AI TTS voices

Sep 28, 2025

I wanted to setup some real-time TTS, so I couldn’t use the RVC method to refine it since that would be too slow.

So, I ended up using the following steps to help improve the TTS a bit:

The audio I had had an echo so I cleaned it up with lalal.ai
I went through and remove all the breath sounds from the audio with audacity
Also with audacity, I used label sounds to break it up by silence into make smaller wav files, and used that folder as the audio source
I went with these settings: 20 epochs, 6 batch size, 10 grads, 11 max audio
I also made a custom “refence audio” after it was finished. The more monotone the reference, the better the voice, but also the more boring. So it’s a bit of work balancing it out.

Sources

https://www.reddit.com/r/Oobabooga/comments/1c09ank/so_you_want_to_finetune_an_xtts_model_let_me_help/

https://www.lalal.ai/echo-reverb-remover/

https://manual.audacityteam.org/man/label_sounds.html

Published inLLMs

Be First to Comment

Leave a Reply Cancel reply