Fast, Easy Voice Cloning on MLX Really Worked

I’m a bit stunned that this repo worked out of the box.

What I did was:

pip install f5-tts-mlx
python -m f5_tts_mlx.generate --text "The quick brown fox jumped over the lazy dog."

And that actually worked. It was a bit shocking.

Voice Cloning

Just make a short voice clip of five or so seconds. Be sure to have a transcript and don’t include any periods. So just say one sentence and you are good. You need to convert it to 24khz sampling.

ffmpeg -i /path/to/audio.wav -ac 1 -ar 24000 -sample_fmt s16 -t 10 /path/to/output_audio.wav

And then you do this like I did

python -m f5_tts_mlx.generate --text "Thanksgiving day was unusually productive as it let me finally create the new design dot co site." --ref-audio maedashort.wav --ref-text "Where his family owned and operated the star tofu manufacturing company." --output blurb.wav

And this blurb.wav magically appears … spooky.