Qwen3-TTS, an Open Source, local ElevenLabs alternative

shrtjsrtj · January 23

Demo: https://huggingface.co/spaces/Qwen/Qwen3-TTS

I'm struggling to get it going locally but there are videos up on YT and the demo above if you want to check it out, it looks pretty interesting.

-Uses maybe 6 gigs of RAM

-Can create custom voices from text

-Can add emotion, change tone, speaking rate, etc.

Godaux · January 26

Greetings. Sounds very interesting! How's your work with Qwen3 going? Have you managed to get it working in Skyrim on systems like CHIM, Mantella, or MinAI? I'm looking for a way to generate non-robotic audio with a TTS system. So far, I've used CHIM with XTTS to generate voices, but the audio is very mechanical, monotonous, and lacks emotion or intonation. I use good online LLMs that try to capture emotions, but the TTS fails to convey them. I've asked characters to speak panting and moaning, but XTTS doesn't seem capable of that (although I'm currently testing it without knowing how to modify its parameters).

plutocene · January 26

I've seen a video about it, a local set up would be quite interesting. Let us know how it goes.

Godaux · January 26

2 hours ago, plutocene said:

I've seen a video about it, a local set up would be quite interesting. Let us know how it goes.

Can you put the link to that video? Are you speaking about a video using it into Skyrim?

plutocene · January 26

Its just a quick intro into this new app being developed.

Edited January 26 by plutocene

shrtjsrtj · January 27

7 hours ago, Godaux said:

Greetings. Sounds very interesting! How's your work with Qwen3 going?

I'll be honest, this is my first time tooling around with local AI and it's been years since I've done any coding/scripting. So I'm in way over my head with this, the reason I posed is there are a lot of smart people here who might be interested.

I did get it going but since everything is so new there isn't much I can do, going to have to wait until some tooling is built around it and people much smarter than me do cool stuff with it.

Next step might be following this: https://www.youtube.com/watch?v=PMzO7N8sIHY to try and locally finetune. I'm curious to see if I can get Skyrim voices with emotion and how well that'd work, just throwing a .wav in and getting it to say lines yields worse results than what I've seen from elevenlabs voice mods.

As for using it with Mantella and the like, I have absolutely no idea how any of that works. I've been debating getting a second GPU just to check that out but prices are going in the wrong direction. I did see someone here on github is playing with realtime streaming, they had to put in a 3 second buffer before the first bit of audio because they had stability issues. No idea if that's too much or not compared to other options, or if it can be improved. Qwen-tts only came out a few days ago so it's early days.

Sign In

Qwen3-TTS, an Open Source, local ElevenLabs alternative

Recommended Posts

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members