On new capabilities in voice generation

- 1 min read

On new capabilities in voice generation

Recently, I’ve tested out ElevenLabs and Sesame that generate dialogue — and do it pretty well.

Now, there are new developments worth sharing:

First of all, on GitHub, an adaptation of the Apple MLX framework that I’ve been following for quite some timehas appeared under the name MLX-Audio. This framework simplifies running AI models on Apple chips (like M1, M2, M3, etc.).

Using this, along with the Dia-1.6B-6Bit model, I generated a solid dialogue snippet about this channel on my M1 Max Mac, which you can listen to in the attachment.

Here’s the command I used to generate it:

uvx -p python3.12 --from mlx-audio mlx_audio.tts.generate
 --model mlx-community/Dia-1.6B-6bit --text "Text here" --speed 0.7

For some reason, python3.13 and 3.10 were not working for me, but 3.12 worked great - and it’s managed automatically by UV’s tool UVX.

Secondly, the top spot on the AI Text-to-Speech Leaderboard is now held by a new model called Speech-02-HD from MiniMaxtry it here.