Project Echo - Voice Packs - Deep Learning Voice Synthesis

Roggvir · May 29, 2022

40 minutes ago, petronius said:

can't afford the time to optimise and filter redundant voiced lines

Ok, that answers all my questions.
That explain those insanely huge donwload file sizes... of course, if you include 70% of audio files the game will NEVER use...
It also explains the horrid quality, because with that amount, you also cannot be bothered to check and tweak even the worst sounding lines.

You do you, but i have to say - what a waste.
Imagine you would put that energy into making just ONE voicepack, but making it contain only what it needs to contain (ie. file size measured in megabytes instead of gigabytes), and making it sound good.
And then somebody else would make a similar quality voicepack for another mod, and somebody else for another mod, etc., until eventually all mods would be covered with a really nice quality voice packs.
But yeah, i know, unfortunately, that is not how reality works, especially not on this modding scene.
But it is a shame.

petronius · May 29, 2022

4 hours ago, Roggvir said:

Ok, that answers all my questions.
That explain those insanely huge donwload file sizes... of course, if you include 70% of audio files the game will NEVER use...
It also explains the horrid quality, because with that amount, you also cannot be bothered to check and tweak even the worst sounding lines.

You do you, but i have to say - what a waste.
Imagine you would put that energy into making just ONE voicepack, but making it contain only what it needs to contain (ie. file size measured in megabytes instead of gigabytes), and making it sound good.
And then somebody else would make a similar quality voicepack for another mod, and somebody else for another mod, etc., until eventually all mods would be covered with a really nice quality voice packs.
But yeah, i know, unfortunately, that is not how reality works, especially not on this modding scene.
But it is a shame.

I see where you're coming from but no, unfortunately that's not how it works, especially not when mods are frequently updated and require maintaining those voiced lines, and not when you're a player and not a dedicated modder, who uses over 100 originally unvoiced mods in their game. It still takes me a full day to process a mod like ToH, imagine that multiplied by 100 (as an extreme maximum cause, granted, most mods don't have as much dialogue as ToH. So, when someone like Executaball shares their conversion, I call it a blessing as it gains me time to start playing sooner rather than later.

The batch generated sound quality may not be ideal and certainly is beneath the possibilities of xVASynth, but it's still better than a muted text line. My last run kept me busy for 1030 hours and over a year, I just got used to the machine voice and didn't pay that much attention to the quality (or lack thereof). As I said, silent dialogue is much more distracting to me. Still, I hope you're wrong and it's not reality but the fact that xVASynth is fairly recent, maybe in a few months you'll start to see interesting stuff coming out, just like bodyslide conversions can be absolute filth made along general guidelines, with automatically transferred weights and all sorts of unresolved issues, or carefully catered ones with manual painting, custom physics xmls and the like. It's great when you get the latter, but the former are more common. Still, with time, we've been getting more of the better ones, so maybe voiced mods will follow the same trend.

applesandmayo · May 29, 2022

Can we get BSA's for BaboDialogue? I keep trying to pack them myself via CAO, and they never work unfortunately.

Roggvir · May 30, 2022

1 hour ago, petronius said:

I see where you're coming from but no, unfortunately that's not how it works, especially not when mods are frequently updated and require maintaining those voiced lines

Updating a voice pack should be rather easy (unless the mod author changed topic info form IDs or replaced them with new infos having new form IDs).

Just do a dialogue export from the new version and compare it with a dialogue export from the old version - you get to know which topic info's text changed, if any, which topic infos were removed and if any new ones got added.

You may need to sort the lines in both CSV by the filename or full path, so then you can use just about any diff/text compare tool, but having some script that just does everything and spits out the results would be of course better.

You reuse the previous pack you made, synthesize only the new or changed lines, and throw away anything that was removed.

Regarding the quality of batch generated lines, and especially whether it is better than a muted text line...
Most of the time i'd agree (except for voices like Serana, which always sound terrible no matter what the line and no matter what you do with it), but sometimes it gets so silly it even makes you laugh (but one can think of it as a good thing too ?).

I noticed it mentioned on the VA discord, that Dan Ruta is working on a new version of voice models, so there is a chance the general quality will also get a bit better soon (at the very least, the fast voice glitch that some voices like Serana suffer from, should be gone, hopefully).

Executaball · May 30, 2022

10 hours ago, Roggvir said:

Ok, that answers all my questions.
That explain those insanely huge donwload file sizes... of course, if you include 70% of audio files the game will NEVER use...
It also explains the horrid quality, because with that amount, you also cannot be bothered to check and tweak even the worst sounding lines.

You do you, but i have to say - what a waste.
Imagine you would put that energy into making just ONE voicepack, but making it contain only what it needs to contain (ie. file size measured in megabytes instead of gigabytes), and making it sound good.
And then somebody else would make a similar quality voicepack for another mod, and somebody else for another mod, etc., until eventually all mods would be covered with a really nice quality voice packs.
But yeah, i know, unfortunately, that is not how reality works, especially not on this modding scene.
But it is a shame.

Well it's not just every voice line repeated for every voice type. We try to do as much dynamic checking as possible to reduce duplicates but sometimes it is not able to be avoided due to the mod implementation. For example a lot of mods add lines to all faction groups or followers, they have no predefined actors or voice type so the only option is to cover all voice types. Imagine something like Submissive Lola which adds follower dialogue that has to be repeated for every possible follower, which is essentially every voice type.

For the case of Troubles of Heroine for example, xTranslator is showing 11,146 ILStrings. If generated for every voice type (There are 60 in total for my last update) That would amount to 685,560 lines. Or in other words more than 720% of the current line count of 94,375. In that case the final file size would be closer to 19 GB

The current voice lines have very little repeated lines if any. The ones that are possibly repeated are essentially down to the fact of mod implementations and dynamic assignments where it is not possible to infer a static voice type. DoubleCheeseBurger has done a lot of work in that regard in identifying runtime voice types from .esps and it is quite efficient in the current state.

The only reason the voice packs are this large in size is simply due to the fact that these mods have a lot of voice lines. You have to consider that even Skyrim itself only has 60,000 voiced lines of dialogue, that's less than just one of the many quest-based mods made here.

Edited May 30, 2022 by Executaball

Executaball · May 30, 2022

4 hours ago, Roggvir said:

Updating a voice pack should be rather easy (unless the mod author changed topic info form IDs or replaced them with new infos having new form IDs).

Just do a dialogue export from the new version and compare it with a dialogue export from the old version - you get to know which topic info's text changed, if any, which topic infos were removed and if any new ones got added.

You may need to sort the lines in both CSV by the filename or full path, so then you can use just about any diff/text compare tool, but having some script that just does everything and spits out the results would be of course better.

You reuse the previous pack you made, synthesize only the new or changed lines, and throw away anything that was removed.

Regarding the quality of batch generated lines, and especially whether it is better than a muted text line...
Most of the time i'd agree (except for voices like Serana, which always sound terrible no matter what the line and no matter what you do with it), but sometimes it gets so silly it even makes you laugh (but one can think of it as a good thing too ?).

I noticed it mentioned on the VA discord, that Dan Ruta is working on a new version of voice models, so there is a chance the general quality will also get a bit better soon (at the very least, the fast voice glitch that some voices like Serana suffer from, should be gone, hopefully).

v3 models should be coming soon for xVASynth, though it'll be a while before anything is trained for Skyrim. We've been experimenting with the phoneme based approach used here at the inference level and it seems to be a positive impact so far. DanRuta is also working on some multilanguage support. I'm looking into 48kHz supersampling with a separate model, since the current voices are semi-limited by the 20.5kHz pipeline size. I'm not sure if that'll made it into v3 but at the very least we should have the context-based pronunciation (heteronyms) demonstrated here and better unseen vocabulary inferencing overall.

I have also noticed the fast voice glitch you mentioned, I suspect it's down to the alignment for V2, since we switched from tacotron to fastpitch's built-in alignment. Even Nvidia noticed that their alignment process was causing issues I think, since they released https://nv-adlr.github.io/RADTTS shortly after (which was actually pulled due to some other issues so it's not even public as of yet)

But yeah, if alignment can be fixed, along with the phoneme character set in training, and *maybe* super-sampling... It could come out to be a good improvement.

We've definitely not come anywhere close to the limitations of what machine learning for voice synthesis can do, which is a good thing. There is a lot of room for improvement. Skyrim voice lines really aren't great training wise, and comparatively we don't have that many samples of a good phoneme distribution. Commercial TTS voice actors like Siri and Google Assistant are usually asked to provide thousands of individual phonemes and words to cover all pronunciation possibilities, we don't really have that luxury with Skyrim. So in essence we are kind of pushing the extreme ends for training data size. This is not a common issue for the neural TTS community though, a lot of work is being done in multi-speaker TTS which can produce accurate synthesis when given in some cases only 15 minutes of speaker audio.

Edited May 30, 2022 by Executaball

Executaball · May 30, 2022

5 hours ago, applesandmayo said:

Can we get BSA's for BaboDialogue? I keep trying to pack them myself via CAO, and they never work unfortunately.

Are you exceeding the 2GB file size limit? It applies to both LE and SE. You need to split them into multiple BSAs if it goes over.

Edited May 30, 2022 by Executaball

LynErso666 · May 30, 2022

Amazing work, peeps! Really really amazing. Thank you so much

Roggvir · May 30, 2022

9 hours ago, Executaball said:

Imagine something like Submissive Lola which adds follower dialogue that has to be repeated for every possible follower, which is essentially every voice type.

That is actually wrong. I made a voice pack for it myself, so i know SLTR well in this regard.
The number of VANILLA+DLC follower voices is limited.
The dialogues in SLTR can be easily split into two categories:

what the owner says
what random npcs say

The owner can only be a follower.

You get yourself a list of all possible followers here, and once you get that list, you can easily put together a list of voices - except, you don't even need a list of voices, a formlist with all followers is all you need.
You are not making a voice pack for some random follower mods, you are making a voicepack for SLTR, so these voices is ALL YOU NEED for the possible owners.

If somebody wants to support his random follower mod, made to use a voice that no vanilla+dlc followers use (which makes no sense, because then that follower will be missing ALL the standard follower dialogue), then they/somebody should make a patch or additional voice pack with the required voice files, but it should NEVER be included in the main voice package for SLTR.
Apart from that, you have a few custom voiced followers, which you can ignore completely, because you don't have the appropriate voice models for xVASynth anyway.
So, you open CK, load SLTR plugin in it, create a new formlist, put all possible followers in that formlist, and then you use this fomlist in quest/alias/scene/dialogue conditions, limiting which voices can use that.

As for the non-owners, the "random npcs", it is similarly easy.

This time, you create a formlist into which you put voice types, and you fill it with all voice types that a vanilla NPC can have - so no creatures, no gods, no special dummy voices, or voices for the Augur, etc.
Then you use this formlist in conditions on appropriate non-owner quests/aliases/scenes/dialogue lines to limit the voices for those voice lines.

And then you do the dialogue export of these quests from CK, and you have the data you need - and it's clean, including only the voices that should be included.
It takes no more than 30 minutes, and i can understand that even 30 minutes can be a lot, but in my opinion, this is how it should be done the right way.

And this kind of "voice filtering logic" can be aplied to ANY mod.
It is just the matter of spending a few seconds to think about it, and spend those 20-30 minutes to do it.

Edited May 30, 2022 by Roggvir

Executaball · May 30, 2022

1 hour ago, Roggvir said:

That is actually wrong. I made a voice pack for it myself, so i know SLTR well in this regard.
The number of VANILLA+DLC follower voices is limited.
The dialogues in SLTR can be easily split into two categories:

what the owner says

what random npcs say

The owner can only be a follower.

You get yourself a list of all possible followers here, and once you get that list, you can easily put together a list of voices - except, you don't even need a list of voices, a formlist with all followers is all you need.
You are not making a voice pack for some random follower mods, you are making a voicepack for SLTR, so these voices is ALL YOU NEED for the possible owners.

If somebody wants to support his random follower mod, made to use a voice that no vanilla+dlc followers use (which makes no sense, because then that follower will be missing ALL the standard follower dialogue), then they/somebody should make a patch or additional voice pack with the required voice files, but it should NEVER be included in the main voice package for SLTR.
Apart from that, you have a few custom voiced followers, which you can ignore completely, because you don't have the appropriate voice models for xVASynth anyway.
So, you open CK, load SLTR plugin in it, create a new formlist, put all possible followers in that formlist, and then you use this fomlist in quest/alias/scene/dialogue conditions, limiting which voices can use that.

As for the non-owners, the "random npcs", it is similarly easy.

This time, you create a formlist into which you put voice types, and you fill it with all voice types that a vanilla NPC can have - so no creatures, no gods, no special dummy voices, or voices for the Augur, etc.
Then you use this formlist in conditions on appropriate non-owner quests/aliases/scenes/dialogue lines to limit the voices for those voice lines.

And then you do the dialogue export of these quests from CK, and you have the data you need - and it's clean, including only the voices that should be included.
It takes no more than 30 minutes, and i can understand that even 30 minutes can be a lot, but in my opinion, this is how it should be done the right way.

And this kind of "voice filtering logic" can be aplied to ANY mod.
It is just the matter of spending a few seconds to think about it, and spend those 20-30 minutes to do it.

Oh. Interesting. I'd definitely be down for that process if it means I can have the line matching be accurate.

In your case how many lines did you end up with for submissive Lola? Do you have the csv file you could share with me to take a look?

applesandmayo · May 30, 2022

10 hours ago, Executaball said:

Are you exceeding the 2GB file size limit? It applies to both LE and SE. You need to split them into multiple BSAs if it goes over.

Ahhh, so that's why that is happening. Any resources available so I can figure out how to do that?

Executaball · May 30, 2022

7 minutes ago, applesandmayo said:

Ahhh, so that's why that is happening. Any resources available so I can figure out how to do that?

If you are using CAO it should automatically split it into 2GB limits? Try using the newest version from nexus.

ecobotstar · May 30, 2022

Do you think we'll get the ancient profession anytime soon?

*Herowynne* · May 30, 2022

3 hours ago, Roggvir said:

You get yourself a list of all possible followers here

I don’t want to be limited by the game’s original list of possible followers.

A mod like RDO adds dialogue to the vanilla voice types, which enables additional NPCs to become followers.

applesandmayo · May 30, 2022

4 hours ago, Executaball said:

If you are using CAO it should automatically split it into 2GB limits? Try using the newest version from nexus.

I was unaware there was a size limit. Setting the limit to 1.97 gigs per BSA worked perfectly, thank you!

Roggvir · May 30, 2022

3 hours ago, Herowynne said:

I don’t want to be limited by the game’s original list of possible followers.

You want something non-standard, something most ppl very likely do not need.

So, make your own patch (or ask somebody to make it for you) which will add those voice files.
It would be insanely unreasonable to force EVERYBODY to download and install gigabytes of data that only YOU want.

smereka · May 30, 2022

Executaball, can you give direct links to mega? I can't get through exc.one page, says site unreachable.

Roggvir · May 30, 2022

5 hours ago, Executaball said:

Oh. Interesting. I'd definitely be down for that process if it means I can have the line matching be accurate.

I will send you all relevant files via PM.

5 hours ago, Executaball said:

In your case how many lines did you end up with for submissive Lola? Do you have the csv file you could share with me to take a look?

My export still contains some redundant data, because i was still too lazy to add all the necessary conditions to make it 100% clean, but it should be significantly less than what you are probably getting in your export.

Current export contains 69.597 lines in total, spread amongst 71 voice types (not uniformly of course, some voices use more lines, some less).

For example, FemaleDarkElf has 2533 lines, FemaleNord 2559, FemaleOrc 2565, FemaleSultry 2554 - those are examples of voices used by possible owners (followers) AND other random NPCs, so these voices have the highest amount of lines.

The non-owner voices have way less lines - like FemaleShrill 309, FemaleUniqueDelphine 271, MaleBandit 272, MaleGuard 275, and so on, most not exceeding 300.

*Herowynne* · May 30, 2022

2 hours ago, Roggvir said:

You want something non-standard, something most ppl very likely do not need.

So, make your own patch (or ask somebody to make it for you) which will add those voice files.
It would be insanely unreasonable to force EVERYBODY to download and install gigabytes of data that only YOU want.

@Roggvir No need for hostility. I am simply pointing out there is another side to this question.

Many people use RDO. Many mods that add followers have a soft dependency on RDO.

Allowing more voice types for followers is a good thing. It adds fun, increases variety, and helps keep the game fresh.

At the end of the day, we all just want to have fun when we play this game. There is not just one single "right way" to play the game.

Roggvir · May 30, 2022

19 minutes ago, Herowynne said:

@Roggvir No need for hostility. I am simply pointing out there is another side to this question.

Many people use RDO. Many mods that add followers have a soft dependency on RDO.

Allowing more voice types for followers is a good thing. It adds fun, increases variety, and helps keep the game fresh.

At the end of the day, we all just want to have fun when we play this game. There is not just one single "right way" to play the game.

I am not hostile, i am just telling you how i see it.
Don't you see how selfish you are? you wanting EVERYBODY to download and install stuff they don't need, just because YOU are too lazy to download additional patch?
Do you want every voice pack to include every single line for every single voice just because some mod out there adds some dialogue to that voice?

Such things belong into a separate patch to be downloaded and used only by the person who wants to use it.

Allowing more voice types for followers does NOTHING, unless you use a follower with that voice.
If you do use a follower with a non-standard voice, download a patch with additional voice files.

Of course, in the end it is up to the creator of any given voice pack, but this is how i see it.

Edited May 30, 2022 by Roggvir

Executaball · May 30, 2022

2 hours ago, smereka said:

Executaball, can you give direct links to mega? I can't get through exc.one page, says site unreachable.

Which file are you looking for? I change the mega links around sometimes but the redirect link should always point to the correct file.

smereka · May 30, 2022

6 minutes ago, Executaball said:

Which file are you looking for? I change the mega links around sometimes but the redirect link should always point to the correct file.

these two, for SE.

9. Fill Her Up Baka Edition v1.71

10. Troubles of Heroine v2.4.1

Executaball · May 30, 2022

1 minute ago, smereka said:

these two, for SE.

9. Fill Her Up Baka Edition v1.71

10. Troubles of Heroine v2.4.1

Hm.. Link seems to be fine on my end. Here are the mega links though:

https://mega.nz/file/YEolgLLQ#z3gSY6Osw1xpMzdmrEtPJJ9O5HTAIwZjgljSC6cDom8

https://mega.nz/file/gcJQGJTb#0q-lSXmRKh4RYzLG9SVAN681FKJo_Hwi9B65Rnal_a4

Antiope_Apollonia · May 31, 2022

11 hours ago, Roggvir said:

The dialogues in SLTR can be easily split into two categories:

what the owner says

what random npcs say

The owner can only be a follower.

You get yourself a list of all possible followers here, and once you get that list, you can easily put together a list of voices - except, you don't even need a list of voices, a formlist with all followers is all you need.

You could do one better and split the followers into male and female, then modularise the voice pack. Most users are only going to be interested in playing with one or the other—Masters or Mistresses—so you could cut down on the download size quite a bit by allowing people to pick and choose which elements to install. And people who want to play with both genders would still have the option to install both.

My guess would be that a number of these mods would admit some modularity in the voice packs, but I don't have experience with most of them to say for sure. SLTR, though, I can say from experience would be a good candidate.

Edited May 31, 2022 by Antiope_Apollonia

Executaball · May 31, 2022

1 hour ago, Antiope_Apollonia said:

You could do one better and split the followers into male and female, then modularise the voice pack. Most users are only going to be interested in playing with one or the other—Masters or Mistresses—so you could cut down on the download size quite a bit by allowing people to pick and choose which elements to install. And people who want to play with both genders would still have the option to install both.

My guess would be that a number of these mods would admit some modularity in the voice packs, but I don't have experience with most of them to say for sure. SLTR, though, I can say from experience would be a good candidate.

Well the only thing is I would end up with multiple BSA and esps each with a small size. Which is not *that* bad with SE ESLs but it'll eat into LE .esp limits if you're not using MO1's managed archives (like vortex or MO2 users)

Sign In

Project Echo - Voice Packs - Deep Learning Voice Synthesis

Recommended Posts

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 1 member