Jump to content

AI Voice generation with elevenlabs


Recommended Posts

Dl'd a few voice packs before, typically they range from barely passable to ok. Was recently playing with Project AHO and was reminded how much worse off the AI voice packs were. Well this one actually seem to general natural sounding voices.  Not affiliated with them at all and only found from some skyrim mod reddit post but this will essentially generate voice lines for you based on whatever sound bytes you can give it for training with a certain level of tuning.  Accounts start from free for a 10,000 character monthly word limit and 5 voice training db, though hopefully if you do use it properly you'll support them since the voice quality is pretty mind blowing.  It can do other languages but think the English probably sounds a bit more natural, it can also take sound bytes in one language and output in another.

 

https://beta.elevenlabs.io/

Tuning:

-Stability: affects how close it is to the training voice and the amount of articulations used. Lower stability results in more natural sounding voices though going too low will add extra pauses, speed up certain sections, cause weird issues, and generally deviate from the original training.  Default 75% is ok, but some lines may sound a little bit flat. Dropping down to 50% is typically fine and desired in some case. Though if you need more articulation on the lines you might need to drop it down to 25% or lower.

-Clarity: not actually 100% sure what it does, think it tries to maintain some level of stability to sounding like a human and close to the original trained voice. Think if you drop the stability you might want to up this value.

 

At any rate without an account you get 330 characters, like 5 retry generation limit with the default voice selection list (without tuning) so doesn't really hurt to try even if you don't create an account.

Also highly suggest you do lines in short-ish blocks. Connecting sentences together can insert some random human phonetics when switching between sentences, but at the moment the stability of the voice drops over time even with high stability settings so the back portion of what's generated may turn into an unsalvageable mess.

 

Decent example of how stability can change the dialogue:

40% stability

 

5% stability

 

 

 

Edited by sinph
Link to comment

Burned through my 10,000 characters on the first day though 10,000 characters really isn't much tbh so can't generate any new ones at the moment.  Really suggest creating an account so you get access to using your own voice samples, it's decently impressive how little of a sample you need to get it working,  Had gotten decent voices out of as little as three 30s sound bytes, also works outputting from one language source to another.  Drop the stability if you think the voice sounds too robotic, this'll increase the articulation but may also introduce some weird glitches. Generate one sentence at a time if you're looking for a specific tone, the quota's done per char generated so it's better to only have to regenerate one sentence instead of a whole paragraph. Dropping the stability can also switch the pitch part way through a paragraph so really better to do it sentence by sentence. Greater voice control I think it coming down the line, but for now think the sample files you feed it will heavily determine the tone of the output. So if you're looking for a specific type of tone like happy, angry, stern or whatnot feed it only lines using that tone.

 

But here's a couple samples fed with some voice lines from Jinx, most were just generated from the first try:

 

Gettysburg address

 

Navy Seal Copy Pasta 

 

 

Link to comment

 

 

Yea, I'd assume part of it just 4chan being 4chan and elevenlabs probably cracking down on that, their tos has been updated within the past few days to including wording against what 4chan was using it for.

 

Some of the dialogue's cringy as hell, but here's a brief generated from a chatgpt script, think each spoken dialogue block took at most 3 tries to generate an acceptable one, most were 2, think one of them only took 1 try.  Think regardless of whether you have character quotas to burn or not, generate the full response block once when starting. If needed you can redo either the block or just a sentence, just note if you do a single sentence the tone might not match up to what you've already generated.

 

 

Jinx: "Ha! Looks like the Sheriff's finally come to town. And she thinks she's got what it takes to take me down. I've got news for you, Caitlyn. I'm the deadliest gal in all of Piltover. I've got an arsenal full of bombs and bullets, and I'm not afraid to use 'em. So, you better watch your back, because I'm gonna give you the run of your life!"

Caitlyn: "Jinx, I've faced tougher opponents than you. You may have a reputation for causing chaos, but I've got the skills and technology to bring you to justice. I won't let you terrorize this city any longer. It's time for you to pay for your crimes."

Jinx: "Pay for my crimes? Please, Caitlyn. I'm the life of the party. I bring excitement wherever I go. And you're just trying to spoil all the fun. But, you know what? I'm feeling generous today. I'll give you a chance to show me what you've got. Let's see if you've got the nerve to keep up with the Jinx."

Caitlyn: "You're making a big mistake, Jinx. I won't be underestimated. I'll take you down with precision and efficiency. And when it's all over, you'll be begging for mercy."

Jinx: "Mercy? From you? I don't think so. I'll take you on with a smile on my face and a rocket in my hand. Let's see if you can handle the madness of the Jinx."

 

And another chatgpt gen with narration.

  Edited by sinph
Link to comment
5 hours ago, Whatiswrongwiththisthing said:

Here's a Serana. Thoughts?

https://vocaroo.com/1mi3KHMZAtzk


That sounds amazing, you can tell it is Serana instantly. Even the sassy attitude is there. I saw some videos about this 11labs thing in the past week, and it's just mind blowing what applications it has across gaming and modding. We can pretty much recreate any character's voice & get them to say anything we want... 

I really liked the longer examples Sinph made as well, I assume they are League of Legends characters though I'm not super familiar with that game. From what little I do remember they sound spot on though. It's crazy how we went so suddenly from robotic as heck text to speech to suddenly nigh perfect voice imitation software. I guess it's following similar trends as AI image generation & language generation developments. AI is mad. 

Link to comment
7 minutes ago, penguintennis said:


That sounds amazing, you can tell it is Serana instantly. Even the sassy attitude is there. I saw some videos about this 11labs thing in the past week, and it's just mind blowing what applications it has across gaming and modding. We can pretty much recreate any character's voice & get them to say anything we want... 

I really liked the longer examples Sinph made as well, I assume they are League of Legends characters though I'm not super familiar with that game. From what little I do remember they sound spot on though. It's crazy how we went so suddenly from robotic as heck text to speech to suddenly nigh perfect voice imitation software. I guess it's following similar trends as AI image generation & language generation developments. AI is mad. 

Welcome to the future. I went looking some more and I found this absolutely incredible video using the same software.

Spoiler

 

All of this audio is fake, and these NPCs have never actually met each other to my knowledge. It sounds so fucking real.

Link to comment

Just from listening to and not trying it myself it seems really good for emotive voices and short texts but for more monotone voices, especially with longer texts, it still sounds off. It seems like there's no changing emphasis between sentences so with longer dialogues things can get a bit samey sentence to sentence, no idea if that's a problem with the ML or whoever made those clips though.

 

E.g.: https://www.youtube.com/watch?v=9Xqw11NPC40

 

 

Link to comment

Hadn't plan on doing skyrim dialogue, partially since 30k char a month probably doesn't cover it (probably covers maybe 30 minutes provided all runs turn out well), and also haven't felt like it. As far as monotone, that's really just a matter of dropping the stability which'll make you get unpredictable results. Got around to getting a bsa extractor (man does Serana have a lot of lines). Only used 14 sentences as a sample, generated a script with ChatGPT, and did generations of each section as a single run.

 

First one is unedited. The second gen is actually maybe the 4th/5th after playing with some settings, and the voice started cutting out as it went on so had to renormalize the volume back in an editor. 3rd one dropped stability down a bit (always fun) though had the same issue, volume cuts out for some reason. Also might need to increase my source sample a little bit.

 

Punctuations matter to do some spaces and articulation, ... for pauses ! for slightly more enthusiasm and ? for some question inflections, not sure if there's a proper way to induce tonnage though.

 

ChatGPT does Skyrim.

v1 moderate stability

v2 low stability, slight edit to the script for punctuations

v3 lower stability - front 1/2 is bad, back 1/2 turned out ok (32 second mark)

v4 - half length, 100% stability (most faithful to the source voice, but also the flattest sounding typically) as a reference

v5 - half length, 100% stability, increase number of sample lines to 49, not sure if there's a difference

Edited by sinph
Link to comment

Messed around with it some more. It's promising but not really useful to voice a character. It's extremely difficult to get it to sound exactly the same each time. Like... the voice changes for each generation. It takes several tries to get a line that useable. It's very interesting, but I don't think I'll be buying into it until it has a few more features.

 

Link to comment
22 hours ago, chajapa said:

Messed around with it some more. It's promising but not really useful to voice a character. It's extremely difficult to get it to sound exactly the same each time. Like... the voice changes for each generation. It takes several tries to get a line that useable. It's very interesting, but I don't think I'll be buying into it until it has a few more features.

 

High stability will generate almost the same voice each time, see above of the 100% even with additional voice samples, but they also typically come out flat and robotic.  Dropping stability down allows for inflections, pauses, different tones, and generally more human traits at the cost of deviating from the source voice and of course just randomness which can tank your voice quality and prevent consistency.  But yea without a regenerate this exact line again for free, it's going to be hard to get proper set of voices that you'd want without spending tonnes of character qoutas. Also the lack of tone tuning make generating the voice in the proper tone a matter of RNG.  As for pauses see above as well, ellipses work really well, exclamation and questions marks not so much. Periods work for the most part, though with low stability the AI may decide to skip over it.

Edited by sinph
Link to comment

I'm so excited for this, it will open the door for mods. ChatGPT can also let modders write way better dialogue, since the current dialogue can often range from cringy to "literally gets the modder cancelled".

 

Future games might even be able to let the player freely talk to NPC using their own words, and have a small chatGPT that responds to them. ChatGPT is capable of rating an argument a player makes, so you could have a guilty looking NPC that has a DC set for quest info. If the player beats a DC 7 with their argument, then the NPC will tell them something relevant.

 

image.png.d5226607aa9cb5efb9e7185571f93dd9.png

Link to comment
On 2/7/2023 at 7:28 AM, chajapa said:

Messed around with it some more. It's promising but not really useful to voice a character. It's extremely difficult to get it to sound exactly the same each time. Like... the voice changes for each generation. It takes several tries to get a line that useable. It's very interesting, but I don't think I'll be buying into it until it has a few more features.

 

From what I've seen, it can voice characters pretty well. You might need to splice different voice samples together to have it sound better, but if you do the results can be almost indistinguishable from the actual voice actor. Here's some good examples I've found with Dagoth Ur:

Spoiler

 

 

Link to comment

Near end of the month and have about 10,000 characters left I won't be using, if anyone needs some lines run for mods / curiosity (nothing terrible) feel free to post the line(s) and what voice pack to use from the bsa (list verbatim cause idk what nord voice refers to ie. dlc1femaleuniquefura) .  Just note 10,000 characters isn't much and lines may take multiple generations so won't even be 10,000 characters total, this is a paid account so no attribution is required.

Link to comment

Someone made some new guard lines using elevenlabs. As much as I am loathe to link to Tiktok, its where he posts most of his stuff, and its very very well done. It's a good example of what AI voice lines could look like in-game, since it would only take a second to get the voice files from the website.

 

 

Edited by Whatiswrongwiththisthing
Additional Video
Link to comment

There's 9000some left resets end of the month (in like 2 days), will also have 30k after that if anyone needs. After that probably canceling cause I haven't really got a need for it and the first month was free so at the very least felt like I should offer them a month's sub.

 

Some before the storm dialogue samples:

Full Dialogue, slight edit for pauses, volume normalized

 

Partial

Balgruuf: "What do you say now, Proventus? Shall we continue to trust in the strength of our walls? Against a dragon?"

Irileth: "My lord, we should send troops to Riverwood at once. It's in the most immediate danger, if that dragon is lurking in the mountains..."

Proventus: "The Jarl of Falkreath will view that as a provocation! He'll assume we're preparing to join Ulfric's side and attack him. We should not... 

 

Balgruuf: "Enough! I'll not stand idly by while a dragon burns my hold and slaughters my people! Irileth, send a detachment to Riverwood at once."

Irileth: "Yes, my Jarl."

Proventus: "If you'll excuse me, I'll return to my duties."

Balgruuf: "That would be best."

Balgruuf: "Well done. You sought me out, on your own initiative. You've done Whiterun a service, and I won't forget it. Here, take this as a small token of my esteem." and hand you a piece of leveled armor. He will continue, "There is another thing you could do for me. Suitable for someone of your particular talents, perhaps. Come, let's go find Farengar, my court wizard. He's been looking into a matter related to these dragons and... rumors of dragons."

 

Insert media is a mess to edit, some of the files disappear and I can't get rid of some of those so w/e there's a few missing lines.  Ignore the stuff below, they're bugs.

Edited by sinph
edit sucks
Link to comment
  • 4 months later...
  • 8 months later...

well guys. I have been banned from elevenlabs because the voice lines i generated where to violant, harmful, discriminating and sexist.

Just made voice addons for mods here on loverslab. So i guess elvenlabs doesnt like that.

So please keep in mind, when using elevenlabs, mods that are.. NSFW are not allowed.

 

talked to the support and they seem pretty upset about it. So yeah, i guess i will have to find another way for it

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue. For more information, see our Privacy Policy & Terms of Use