Jump to content

Qwen3-TTS, an Open Source, local ElevenLabs alternative


Recommended Posts

Posted

Greetings. Sounds very interesting! How's your work with Qwen3 going? Have you managed to get it working in Skyrim on systems like CHIM, Mantella, or MinAI? I'm looking for a way to generate non-robotic audio with a TTS system. So far, I've used CHIM with XTTS to generate voices, but the audio is very mechanical, monotonous, and lacks emotion or intonation. I use good online LLMs that try to capture emotions, but the TTS fails to convey them. I've asked characters to speak panting and moaning, but XTTS doesn't seem capable of that (although I'm currently testing it without knowing how to modify its parameters).

Posted
2 hours ago, plutocene said:

I've seen a video about it, a local set up would be quite interesting. Let us know how it goes.

Can you put the link to that video? Are you speaking about a video using it into Skyrim?

Posted
7 hours ago, Godaux said:

Greetings. Sounds very interesting! How's your work with Qwen3 going? 

I'll be honest, this is my first time tooling around with local AI and it's been years since I've done any coding/scripting. So I'm in way over my head with this, the reason I posed is there are a lot of smart people here who might be interested.

 

I did get it going but since everything is so new there isn't much I can do, going to have to wait until some tooling is built around it and people much smarter than me do cool stuff with it.

 

Next step might be following this: https://www.youtube.com/watch?v=PMzO7N8sIHY to try and locally finetune. I'm curious to see if I can get Skyrim voices with emotion and how well that'd work, just throwing a .wav in and getting it to say lines yields worse results than what I've seen from elevenlabs voice mods.

 

As for using it with Mantella and the like, I have absolutely no idea how any of that works. I've been debating getting a second GPU just to check that out but prices are going in the wrong direction. I did see someone here on github is playing with realtime streaming, they had to put in a 3 second buffer before the first bit of audio because they had stability issues. No idea if that's too much or not compared to other options, or if it can be improved. Qwen-tts only came out a few days ago so it's early days.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...