I’m a bit stunned that this repo worked out of the box.
What I did was:
pip install f5-tts-mlx
python -m f5_tts_mlx.generate --text "The quick brown fox jumped over the lazy dog."
And that actually worked. It was a bit shocking.
Voice Cloning
Just make a short voice clip of five or so seconds. Be sure to have a transcript and don’t include any periods. So just say one sentence and you are good. You need to convert it to 24khz sampling.
ffmpeg -i /path/to/audio.wav -ac 1 -ar 24000 -sample_fmt s16 -t 10 /path/to/output_audio.wav
And then you do this like I did
python -m f5_tts_mlx.generate --text "Thanksgiving day was unusually productive as it let me finally create the new design dot co site." --ref-audio maedashort.wav --ref-text "Where his family owned and operated the star tofu manufacturing company." --output blurb.wav
And this blurb.wav magically appears … spooky.