Skip to content

Refining AI TTS voices

I wanted to setup some real-time TTS, so I couldn’t use the RVC method to refine it since that would be too slow.

So, I ended up using the following steps to help improve the TTS a bit:

  1. The audio I had had an echo so I cleaned it up with lalal.ai
  2. I went through and remove all the breath sounds from the audio with audacity
  3. Also with audacity, I used label sounds to break it up by silence into make smaller wav files, and used that folder as the audio source
  4. I went with these settings: 20 epochs, 6 batch size, 10 grads, 11 max audio
  5. I also made a custom “refence audio” after it was finished.  The more monotone the reference, the better the voice, but also the more boring.  So it’s a bit of work balancing it out.

 

Sources

https://www.reddit.com/r/Oobabooga/comments/1c09ank/so_you_want_to_finetune_an_xtts_model_let_me_help/

https://www.lalal.ai/echo-reverb-remover/

https://manual.audacityteam.org/man/label_sounds.html

Published inLLMs

Be First to Comment

Leave a Reply

Your email address will not be published.