Jump to content

Project Echo - Voice Packs - Deep Learning Voice Synthesis


Recommended Posts

Posted
10 hours ago, Executaball said:

 

Currently they don't, no. It's just the base mod. I'll look into adding the voices for the addons soon.

 

I'm not sure about the merged version though due to the naming scheme, the mod and folder structures within the merged version was changed from 'troublesofheroine.esp' to 'TroublesOfHeroineSE.esp' for some reason...

 

Also the zip unpacks to over 3.5 GB? Wasn't the original mod only 77 MB? What exactly is in the merged version ?

 

Edit: Nvm I think the merged version actually included the loose files from the previous version of my ToH voice pack. 

Yeah, I would forget about the merged version and would recommend to DSHV that if sharing a SSE port, they shared the ported separate modules instead of a merged version and let the user decide whether or not to merge them. zEdit will handle the voice files too, well enough. Not wanted to be nosy but if I were you I'd drop support for the SSE version, sound files are exactly the same and the renamed plugin, plus renumbered records in the merged version, will make it a heck of a lot of trouble for you to support both versions.

 

Meanwhile, I generated voice files for current version of the add-ons, I think it's consistent with your main mod work (same voices, exported dialogue with xEdit script, batch generated .fuz in xVASynth), you can post it on the main page if you want, if for any reason my link dies or I need to take it down I'll let you know.

https://mega.nz/file/8lpAgZzB#L2Tbks2PrjtlswQUdo_EApvjchHLZTNbs-eNLcyt_8c

Posted
1 hour ago, petronius said:

exported dialogue with xEdit script

Curious... would you share that xEdit script?
How does it filter the voices when generating export? What are its advantages, compared to using CK to generate the dialogue export CSV?

Posted
3 minutes ago, Roggvir said:

Curious... would you share that xEdit script?
How does it filter the voices when generating export? What are its advantages, compared to using CK to generate the dialogue export CSV?

 

I use DoubleCheeseBurger's script that you can find here: 

The main advantage to me is speed - I've generated voice files to dozens of muted mods and can't afford the time to optimise and filter redundant voiced lines for all of them... I mod to play :P The script above will generate dialogue for a list of voices (you can tweak those in the script itself). A good number of those lines will never be spoken by many of the voices you're generating them for, but this is a sweep job where you want to make sure you cover as close as possible to all the lines that you'll encounter in game.

 

For a large mod like ToH this is A LOT of voice lines, 90K+ (add another 25k with the dlc add-ons). I recommend packing those in bsa (uncompressed), especially if you do this for a lot of mods, it will definitely benefit your load times. If you do this, all those tiny .fuz files won't be much of a bother.

 

 

Posted
40 minutes ago, petronius said:

can't afford the time to optimise and filter redundant voiced lines

Ok, that answers all my questions.
That explain those insanely huge donwload file sizes... of course, if you include 70% of audio files the game will NEVER use...
It also explains the horrid quality, because with that amount, you also cannot be bothered to check and tweak even the worst sounding lines.

 

You do you, but i have to say - what a waste.
Imagine you would put that energy into making just ONE voicepack, but making it contain only what it needs to contain (ie. file size measured in megabytes instead of gigabytes), and making it sound good.
And then somebody else would make a similar quality voicepack for another mod, and somebody else for another mod, etc., until eventually all mods would be covered with a really nice quality voice packs.
But yeah, i know, unfortunately, that is not how reality works, especially not on this modding scene.
But it is a shame.

Posted
4 hours ago, Roggvir said:

Ok, that answers all my questions.
That explain those insanely huge donwload file sizes... of course, if you include 70% of audio files the game will NEVER use...
It also explains the horrid quality, because with that amount, you also cannot be bothered to check and tweak even the worst sounding lines.

 

You do you, but i have to say - what a waste.
Imagine you would put that energy into making just ONE voicepack, but making it contain only what it needs to contain (ie. file size measured in megabytes instead of gigabytes), and making it sound good.
And then somebody else would make a similar quality voicepack for another mod, and somebody else for another mod, etc., until eventually all mods would be covered with a really nice quality voice packs.
But yeah, i know, unfortunately, that is not how reality works, especially not on this modding scene.
But it is a shame.

 

I see where you're coming from but no, unfortunately that's not how it works, especially not when mods are frequently updated and require maintaining those voiced lines, and not when you're a player and not a dedicated modder, who uses over 100 originally unvoiced mods in their game. It still takes me a full day to process a mod like ToH, imagine that multiplied by 100 (as an extreme maximum cause, granted, most mods don't have as much dialogue as ToH. So, when someone like Executaball shares their conversion, I call it a blessing as it gains me time to start playing sooner rather than later.

 

The batch generated sound quality may not be ideal and certainly is beneath the possibilities of xVASynth, but it's still better than a muted text line. My last run kept me busy for 1030 hours and over a year, I just got used to the machine voice and didn't pay that much attention to the quality (or lack thereof). As I said, silent dialogue is much more distracting to me. Still, I hope you're wrong and it's not reality but the fact that xVASynth is fairly recent, maybe in a few months you'll start to see interesting stuff coming out, just like bodyslide conversions can be absolute filth made along general guidelines, with automatically transferred weights and all sorts of unresolved issues, or carefully catered ones with manual painting, custom physics xmls and the like. It's great when you get the latter, but the former are more common. Still, with time, we've been getting more of the better ones, so maybe voiced mods will follow the same trend.

Posted
1 hour ago, petronius said:

I see where you're coming from but no, unfortunately that's not how it works, especially not when mods are frequently updated and require maintaining those voiced lines

 Updating a voice pack should be rather easy (unless the mod author changed topic info form IDs or replaced them with new infos having new form IDs).

Just do a dialogue export from the new version and compare it with a dialogue export from the old version - you get to know which topic info's text changed, if any, which topic infos were removed and if any new ones got added.

You may need to sort the lines in both CSV by the filename or full path, so then you can use just about any diff/text compare tool, but having some script that just does everything and spits out the results would be of course better.

You reuse the previous pack you made, synthesize only the new or changed lines, and throw away anything that was removed.

 

Regarding the quality of batch generated lines, and especially whether it is better than a muted text line...
Most of the time i'd agree (except for voices like Serana, which always sound terrible no matter what the line and no matter what you do with it), but sometimes it gets so silly it even makes you laugh (but one can think of it as a good thing too ?).

 

I noticed it mentioned on the VA discord, that Dan Ruta is working on a new version of voice models, so there is a chance the general quality will also get a bit better soon (at the very least, the fast voice glitch that some voices like Serana suffer from, should be gone, hopefully).

 

Posted (edited)
10 hours ago, Roggvir said:

Ok, that answers all my questions.
That explain those insanely huge donwload file sizes... of course, if you include 70% of audio files the game will NEVER use...
It also explains the horrid quality, because with that amount, you also cannot be bothered to check and tweak even the worst sounding lines.

 

You do you, but i have to say - what a waste.
Imagine you would put that energy into making just ONE voicepack, but making it contain only what it needs to contain (ie. file size measured in megabytes instead of gigabytes), and making it sound good.
And then somebody else would make a similar quality voicepack for another mod, and somebody else for another mod, etc., until eventually all mods would be covered with a really nice quality voice packs.
But yeah, i know, unfortunately, that is not how reality works, especially not on this modding scene.
But it is a shame.

 

Well it's not just every voice line repeated for every voice type. We try to do as much dynamic checking as possible to reduce duplicates but sometimes it is not able to be avoided due to the mod implementation. For example a lot of mods add lines to all faction groups or followers, they have no predefined actors or voice type so the only option is to cover all voice types. Imagine something like Submissive Lola which adds follower dialogue that has to be repeated for every possible follower, which is essentially every voice type.

 

For the case of Troubles of Heroine for example, xTranslator is showing 11,146 ILStrings. If generated for every voice type (There are 60 in total for my last update) That would amount to 685,560 lines. Or in other words more than 720% of the current line count of 94,375. In that case the final file size would be closer to 19 GB

The current voice lines have very little repeated lines if any. The ones that are possibly repeated are essentially down to the fact of mod implementations and dynamic assignments where it is not possible to infer a static voice type. DoubleCheeseBurger has done a lot of work in that regard in identifying runtime voice types from .esps and it is quite efficient in the current state.

The only reason the voice packs are this large in size is simply due to the fact that these mods have a lot of voice lines. You have to consider that even Skyrim itself only has 60,000 voiced lines of dialogue, that's less than just one of the many quest-based mods made here.

Edited by Executaball
Posted (edited)
4 hours ago, Roggvir said:

 Updating a voice pack should be rather easy (unless the mod author changed topic info form IDs or replaced them with new infos having new form IDs).

Just do a dialogue export from the new version and compare it with a dialogue export from the old version - you get to know which topic info's text changed, if any, which topic infos were removed and if any new ones got added.

You may need to sort the lines in both CSV by the filename or full path, so then you can use just about any diff/text compare tool, but having some script that just does everything and spits out the results would be of course better.

You reuse the previous pack you made, synthesize only the new or changed lines, and throw away anything that was removed.

 

Regarding the quality of batch generated lines, and especially whether it is better than a muted text line...
Most of the time i'd agree (except for voices like Serana, which always sound terrible no matter what the line and no matter what you do with it), but sometimes it gets so silly it even makes you laugh (but one can think of it as a good thing too ?).

 

I noticed it mentioned on the VA discord, that Dan Ruta is working on a new version of voice models, so there is a chance the general quality will also get a bit better soon (at the very least, the fast voice glitch that some voices like Serana suffer from, should be gone, hopefully).

 

 

v3 models should be coming soon for xVASynth, though it'll be a while before anything is trained for Skyrim. We've been experimenting with the phoneme based approach used here at the inference level and it seems to be a positive impact so far. DanRuta is also working on some multilanguage support. I'm looking into 48kHz supersampling with a separate model, since the current voices are semi-limited by the 20.5kHz pipeline size. I'm not sure if that'll made it into v3 but at the very least we should have the context-based pronunciation (heteronyms) demonstrated here and better unseen vocabulary inferencing overall. 

 

I have also noticed the fast voice glitch you mentioned, I suspect it's down to the alignment for V2, since we switched from tacotron to fastpitch's built-in alignment. Even Nvidia noticed that their alignment process was causing issues I think, since they released https://nv-adlr.github.io/RADTTS shortly after (which was actually pulled due to some other issues so it's not even public as of yet) 

But yeah, if alignment can be fixed, along with the phoneme character set in training, and *maybe* super-sampling... It could come out to be a good improvement.

 

We've definitely not come anywhere close to the limitations of what machine learning for voice synthesis can do, which is a good thing. There is a lot of room for improvement. Skyrim voice lines really aren't great training wise, and comparatively we don't have that many samples of a good phoneme distribution. Commercial TTS voice actors like Siri and Google Assistant are usually asked to provide thousands of individual phonemes and words to cover all pronunciation possibilities, we don't really have that luxury with Skyrim. So in essence we are kind of pushing the extreme ends for training data size. This is not a common issue for the neural TTS community though, a lot of work is being done in multi-speaker TTS which can produce accurate synthesis when given in some cases only 15 minutes of speaker audio.

Edited by Executaball
Posted (edited)
5 hours ago, applesandmayo said:

Can we get BSA's for BaboDialogue? I keep trying to pack them myself via CAO, and they never work unfortunately.

 

Are you exceeding the 2GB file size limit? It applies to both LE and SE. You need to split them into multiple BSAs if it goes over.

Edited by Executaball
Posted (edited)
9 hours ago, Executaball said:

Imagine something like Submissive Lola which adds follower dialogue that has to be repeated for every possible follower, which is essentially every voice type.

That is actually wrong. I made a voice pack for it myself, so i know SLTR well in this regard.
The number of VANILLA+DLC follower voices is limited.
The dialogues in SLTR can be easily split into two categories:

  1. what the owner says
  2. what random npcs say

The owner can only be a follower.

You get yourself a list of all possible followers here, and once you get that list, you can easily put together a list of voices - except, you don't even need a list of voices, a formlist with all followers is all you need.
You are not making a voice pack for some random follower mods, you are making a voicepack for SLTR, so these voices is ALL YOU NEED for the possible owners.

If somebody wants to support his random follower mod, made to use a voice that no vanilla+dlc followers use (which makes no sense, because then that follower will be missing ALL the standard follower dialogue), then they/somebody should make a patch or additional voice pack with the required voice files, but it should NEVER be included in the main voice package for SLTR.
Apart from that, you have a few custom voiced followers, which you can ignore completely, because you don't have the appropriate voice models for xVASynth anyway.

So, you open CK, load SLTR plugin in it, create a new formlist, put all possible followers in that formlist, and then you use this fomlist in quest/alias/scene/dialogue conditions, limiting which voices can use that.

 

As for the non-owners, the "random npcs", it is similarly easy.

This time, you create a formlist into which you put voice types, and you fill it with all voice types that a vanilla NPC can have - so no creatures, no gods, no special dummy voices, or voices for the Augur, etc.
Then you use this formlist in conditions on appropriate non-owner quests/aliases/scenes/dialogue lines to limit the voices for those voice lines.

 

And then you do the dialogue export of these quests from CK, and you have the data you need - and it's clean, including only the voices that should be included.
It takes no more than 30 minutes, and i can understand that even 30 minutes can be a lot, but in my opinion, this is how it should be done the right way.

 

And this kind of "voice filtering logic" can be aplied to ANY mod.
It is just the matter of spending a few seconds to think about it, and spend those 20-30 minutes to do it.

Edited by Roggvir
Posted
1 hour ago, Roggvir said:

That is actually wrong. I made a voice pack for it myself, so i know SLTR well in this regard.
The number of VANILLA+DLC follower voices is limited.
The dialogues in SLTR can be easily split into two categories:

  1. what the owner says
  2. what random npcs say

The owner can only be a follower.

You get yourself a list of all possible followers here, and once you get that list, you can easily put together a list of voices - except, you don't even need a list of voices, a formlist with all followers is all you need.
You are not making a voice pack for some random follower mods, you are making a voicepack for SLTR, so these voices is ALL YOU NEED for the possible owners.

If somebody wants to support his random follower mod, made to use a voice that no vanilla+dlc followers use (which makes no sense, because then that follower will be missing ALL the standard follower dialogue), then they/somebody should make a patch or additional voice pack with the required voice files, but it should NEVER be included in the main voice package for SLTR.
Apart from that, you have a few custom voiced followers, which you can ignore completely, because you don't have the appropriate voice models for xVASynth anyway.

So, you open CK, load SLTR plugin in it, create a new formlist, put all possible followers in that formlist, and then you use this fomlist in quest/alias/scene/dialogue conditions, limiting which voices can use that.

 

As for the non-owners, the "random npcs", it is similarly easy.

This time, you create a formlist into which you put voice types, and you fill it with all voice types that a vanilla NPC can have - so no creatures, no gods, no special dummy voices, or voices for the Augur, etc.
Then you use this formlist in conditions on appropriate non-owner quests/aliases/scenes/dialogue lines to limit the voices for those voice lines.

 

And then you do the dialogue export of these quests from CK, and you have the data you need - and it's clean, including only the voices that should be included.
It takes no more than 30 minutes, and i can understand that even 30 minutes can be a lot, but in my opinion, this is how it should be done the right way.

 

And this kind of "voice filtering logic" can be aplied to ANY mod.
It is just the matter of spending a few seconds to think about it, and spend those 20-30 minutes to do it.

 

Oh. Interesting. I'd definitely be down for that process if it means I can have the line matching be accurate.

 

In your case how many lines did you end up with for submissive Lola? Do you have the csv file you could share with me to take a look?

Posted
10 hours ago, Executaball said:

 

Are you exceeding the 2GB file size limit? It applies to both LE and SE. You need to split them into multiple BSAs if it goes over.

Ahhh, so that's why that is happening. Any resources available so I can figure out how to do that?

Posted
7 minutes ago, applesandmayo said:

Ahhh, so that's why that is happening. Any resources available so I can figure out how to do that?

 

If you are using CAO it should automatically split it into 2GB limits? Try using the newest version from nexus.

Posted
4 hours ago, Executaball said:

 

If you are using CAO it should automatically split it into 2GB limits? Try using the newest version from nexus.

I was unaware there was a size limit. Setting the limit to 1.97 gigs per BSA worked perfectly, thank you!

Posted
3 hours ago, Herowynne said:

I don’t want to be limited by the game’s original list of possible followers.

You want something non-standard, something most ppl very likely do not need.

So, make your own patch (or ask somebody to make it for you) which will add those voice files.
It would be insanely unreasonable to force EVERYBODY to download and install gigabytes of data that only YOU want.

Posted
5 hours ago, Executaball said:

Oh. Interesting. I'd definitely be down for that process if it means I can have the line matching be accurate.

I will send you all relevant files via PM.

 

5 hours ago, Executaball said:

In your case how many lines did you end up with for submissive Lola? Do you have the csv file you could share with me to take a look?

My export still contains some redundant data, because i was still too lazy to add all the necessary conditions to make it 100% clean, but it should be significantly less than what you are probably getting in your export.

Current export contains 69.597 lines in total, spread amongst 71 voice types (not uniformly of course, some voices use more lines, some less).

 

For example, FemaleDarkElf has 2533 lines, FemaleNord 2559, FemaleOrc 2565, FemaleSultry 2554 - those are examples of voices used by possible owners (followers) AND other random NPCs, so these voices have the highest amount of lines.

The non-owner voices have way less lines - like FemaleShrill 309, FemaleUniqueDelphine 271, MaleBandit 272, MaleGuard 275, and so on, most not exceeding 300.

Posted
2 hours ago, Roggvir said:

You want something non-standard, something most ppl very likely do not need.

So, make your own patch (or ask somebody to make it for you) which will add those voice files.
It would be insanely unreasonable to force EVERYBODY to download and install gigabytes of data that only YOU want.

 

@Roggvir No need for hostility. I am simply pointing out there is another side to this question.

 

Many people use RDO. Many mods that add followers have a soft dependency on RDO.

 

Allowing more voice types for followers is a good thing. It adds fun, increases variety, and helps keep the game fresh.

 

At the end of the day, we all just want to have fun when we play this game. There is not just one single "right way" to play the game.

Posted (edited)
19 minutes ago, Herowynne said:

 

@Roggvir No need for hostility. I am simply pointing out there is another side to this question.

 

Many people use RDO. Many mods that add followers have a soft dependency on RDO.

 

Allowing more voice types for followers is a good thing. It adds fun, increases variety, and helps keep the game fresh.

 

At the end of the day, we all just want to have fun when we play this game. There is not just one single "right way" to play the game.

I am not hostile, i am just telling you how i see it.
Don't you see how selfish you are? you wanting EVERYBODY to download and install stuff they don't need, just because YOU are too lazy to download additional patch?
Do you want every voice pack to include every single line for every single voice just because some mod out there adds some dialogue to that voice?

Such things belong into a separate patch to be downloaded and used only by the person who wants to use it.

 

Allowing more voice types for followers does NOTHING, unless you use a follower with that voice.
If you do use a follower with a non-standard voice, download a patch with additional voice files.

 

Of course, in the end it is up to the creator of any given voice pack, but this is how i see it.

Edited by Roggvir
Posted
2 hours ago, smereka said:

Executaball, can you give direct links to mega? I can't get through exc.one page, says site unreachable.

Which file are you looking for? I change the mega links around sometimes but the redirect link should always point to the correct file.

Posted
6 minutes ago, Executaball said:

Which file are you looking for? I change the mega links around sometimes but the redirect link should always point to the correct file.

these two, for SE.

9. Fill Her Up Baka Edition v1.71

10. Troubles of Heroine v2.4.1

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...