Jump to content

Project Echo - Voice Packs - Deep Learning Voice Synthesis


Recommended Posts

Posted

Also, an update for Submissive Lola: The Resubmission v. 2.0.47 would be very welcome! There have been quite some changes to the dialogues. Pretty please? ?

Posted
On 12/17/2021 at 7:33 AM, Nymra said:


I think this might be due to the fact that most mod authors never set the voices in the ESP correctly. Which character is it? Maybe I can find it in the ESP and see what is wrong (what is the name)

The character with the wrong voice is the player character herself. It seems at least Breton and Nord races are affected, but I guess it's global.

Posted

@Executaball Hi, I'm generating some voice packs for some of the mods on my modlist and got curious to check out the voice files you included for Dovahkiin's Infamy. I noticed there is a folder for maleuniqueghost. I can't seem to find this voice model anywhere; it's not listed on XVASynth's modpage. Did you train it yourself and are you able to share it?

Posted
4 hours ago, QuantumWraith said:

@Executaball Hi, I'm generating some voice packs for some of the mods on my modlist and got curious to check out the voice files you included for Dovahkiin's Infamy. I noticed there is a folder for maleuniqueghost. I can't seem to find this voice model anywhere; it's not listed on XVASynth's modpage. Did you train it yourself and are you able to share it?

@ExecutaballIf there is a way to train voices ourselves, I am interested to know! There are a few voices followers who could use the treatment.

Posted
4 hours ago, TrollAutokill said:

@ExecutaballIf there is a way to train voices ourselves, I am interested to know! There are a few voices followers who could use the treatment.

Yeah just tie them to a post and beat them till u get the right pitch. LOL.

Posted (edited)

The voice pack for Simple Slavery++ needs an update as the modification it is for is now at version 6.3.14, so can anyone do this pretty please?

The voice pack though currently is for version 6.3.12 of Simple Slavery++, there's been dialogue additions and alterations.

Edited by Leoosp
  • 4 weeks later...
Posted (edited)
Quote

14. Thief BSA Voice Pack V1.0.0.8-BME

 

It looks like the SE Split Version 1 zip files are corrupted.

 

 

Spoiler

[SE] (Gen 2.1) Thief v1.0.8-BME EVP v1.z01: Dogma - Thief_Voices3.bsa - Cannot read archive data
An error occurred

Edited by masterchief24
Posted (edited)
22 hours ago, masterchief24 said:

t looks like the SE Split Version 1 zip files are corrupted.

 

I could download and open Part 0, but Part 1 was big enough that I'd have they want me to pay and I'm not doing that. I just wanted to test it for myself to see if I can open the archive. It's currently downloading, we'll see how it goes.

I will say that in the past when I got errors downloading, sometimes just going to a different browser worked.

 

Update: I was able to download Part 1. Once I changed the file extension from .z01 to .zip I was able to open it.

 

Edited by Seeker999
update
Posted

Still waiting for the voice pack for Sexy Bandit Captives SSE v0.971 AIO. 

 

If anyone want to make the voice pack but don't have the mod send me a pm.

Posted

So, i've seen where people have asked about xVAsynth for Apropos. I'd like to see this happen as well but as stated a few pages back, Apropos doesn't use the skyrim native dialogue to show the text and xVASynth I believe requires that.

 

I do however wonder, since there is a patch for Fuz Ro Bork that reads books for you, if that may somehow be reverse engineered to find a way patch apropos into fuz ro bork. If that would be enough to get xVASynth to play audio for Apropos.

 

Maybe a script for apropos that hooks into the Audio Book version of Fuz Ro Bork. The messages would unfortunately have to load as a bookUI which may be annoying for the sake of immersion. this however, would make it ?incompatible? with any mod that allows skyrim to proceed in the background with a menu or book open.

 

This isn't intended as a request, more just a rambling of an idea that may be a viable workaround, however this is coming from someone with a rudimentary understanding of making mods. 

  • 1 month later...
Posted

This looks absolutely amazing. Great work! Could I please put in a vote for this to be done for Maria Eden SE? It would take the immersion of that mod to the next level ?

Posted (edited)
On 2/19/2022 at 11:21 PM, devildx said:

Still waiting for the voice pack for Sexy Bandit Captives SSE v0.971 AIO. 

 

If anyone want to make the voice pack but don't have the mod send me a pm.

I have Sexy_Bandit_Captives_SSE_v0.971_D_AIO_Plus_MCM

I guess it should do, I will see what I can do.

 

EDIT: There is a new version 0.98d it seems SBC was recently revived!

 

 

Edited by TrollAutokill
Posted (edited)
1 hour ago, TrollAutokill said:

I have Sexy_Bandit_Captives_SSE_v0.971_D_AIO_Plus_MCM

I guess it should do, I will see what I can do.

 

EDIT: There is a new version 0.98d it seems SBC was recently revived!

 

 

No need anymore. Already did for myself. I play with Skyfem mod so my version have only female voices including female voices for the male dialogs. You can just delete the male folders if you want only the female voices.

 

https://www.loverslab.com/topic/111415-skyfem-all-npcs-now-female-special-edition/page/31/#comment-3696506

 

Great to see that the mod is up again.

 

EDIT: Seems like the new version is voiced so only use my file for the specific version 0.971 AIO.

Edited by devildx
Posted
1 minute ago, devildx said:

No need anymore. Already did for myself. I play with Skyfem mod so my version have only female voices including female voices for the male dialogs. You can just delete the male folders if you want only the female voices.

 

https://www.loverslab.com/topic/111415-skyfem-all-npcs-now-female-special-edition/page/31/#comment-3696506

 

 

Great. Is that 0.98 or 0.97? Maybe it doesn't make any difference.

Posted
38 minutes ago, TrollAutokill said:

Great. Is that 0.98 or 0.97? Maybe it doesn't make any difference.

It is for the old version 0.971. The new 0.98 seems to be a complete new mod and in the description it says that is already voiced.

 

I will test the new version when I start a new game. Just waiting for some other mods to update before I do.

Posted (edited)

Hey all. Sorry for the extended absence but I have been working on the voice generation workflow in the meantime. For the past months I have been working with some other researchers on an end-to-end model for phoneme and pitch-energy modulation to compliment the current FastPitch model by Nvidia. Quality is improved by a notable margin and the biggest current issue with current AI generated voices (awkward pronunciations) is fixed for the most part. 

 

It turns out English is actually quite hard to pronounce, some words have multiple pronunciations despite having the same spelling, when used in different contexts. We have over 950 words in English where the pronunciation or intonation changes based on context or usage of the word.

 

So in the current model I've employed separate end-to-end neural network models, one for pronunciation and deducing pronunciation, the new Fast Pitch models, and another for energy modulation for emotion. This architecture is partly based on Fast-Speech and the proposed RAD-TTS structure which was released last year by Nvidia at Interspeech.

 

FastSpeech: New text-to-speech model improves on speed, accuracy, and  controllability - Microsoft Research

 

Additionally there is now a residual convolutional neural network added for lexical stress detection. This is where, in human speech, we would put stress on certain syllables of words depending on the sentence structure:

 

image.png.0c9e78fba6dd5c67a479456e288cbe95.png

 

It is now possible for neural network based voice models to know the context of the sentence and use that in predicting how to pronounce a word. Here's of what we currently have to deal with:

 

 

Here is an example of a sentence using our new model pipeline with feed-forward neural network based contextual recognition:

 

 

I can't really demonstrate the energy differences but you can try some of the examples below to get a feel of the minor improvements in the model's expression of emotion and sentence stress. I still have some stuff to fix, with neural networks nothing is really constant and there will still be issues moving forward, but... If all goes well I might start going over the current voice packs and updating them to the new types and mod versions in the next few weeks.


Here are some example comparisons in the meantime between the current pipeline and my new one. 

 

 

And here are just some more exported voice lines using Submissive Lola as an example. Remember these are all part of the model itself, no individual tweaking or adjustments can be done, which is important since covering voices for LoversLab type mods can go up to hundreds of thousands of lines easily:

 

 

 

 

Edited by Executaball
Posted
On 4/8/2022 at 8:23 AM, devildx said:

No need anymore. Already did for myself. I play with Skyfem mod so my version have only female voices including female voices for the male dialogs. You can just delete the male folders if you want only the female voices.

 

https://www.loverslab.com/topic/111415-skyfem-all-npcs-now-female-special-edition/page/31/#comment-3696506

 

Great to see that the mod is up again.

 

EDIT: Seems like the new version is voiced so only use my file for the specific version 0.971 AIO.

Hey, just curious, but you wouldn't happen to have genderswapped voices for vanilla dialogue, would you?

Posted
11 hours ago, Executaball said:
Spoiler

Hey all. Sorry for the extended absence but I have been working on the voice generation workflow in the meantime. For the past months I have been working with some other researchers on an end-to-end model for phoneme and pitch-energy modulation to compliment the current FastPitch model by Nvidia. Quality is improved by a notable margin and the biggest current issue with current AI generated voices (awkward pronunciations) is fixed for the most part. 

 

It turns out English is actually quite hard to pronounce, some words have multiple pronunciations despite having the same spelling, when used in different contexts. We have over 950 words in English where the pronunciation or intonation changes based on context or usage of the word.

 

So in the current model I've employed separate end-to-end neural network models, one for pronunciation and deducing pronunciation, the new Fast Pitch models, and another for energy modulation for emotion. This architecture is partly based on Fast-Speech and the proposed RAD-TTS structure which was released last year by Nvidia at Interspeech.

 

FastSpeech: New text-to-speech model improves on speed, accuracy, and  controllability - Microsoft Research

 

Additionally there is now a residual convolutional neural network added for lexical stress detection. This is where, in human speech, we would put stress on certain syllables of words depending on the sentence structure:

 

image.png.0c9e78fba6dd5c67a479456e288cbe95.png

 

It is now possible for neural network based voice models to know the context of the sentence and use that in predicting how to pronounce a word. Here's of what we currently have to deal with:

 

 

Here is an example of a sentence using our new model pipeline with feed-forward neural network based contextual recognition:

 

 

I can't really demonstrate the energy differences but you can try some of the examples below to get a feel of the minor improvements in the model's expression of emotion and sentence stress. I still have some stuff to fix, with neural networks nothing is really constant and there will still be issues moving forward, but... If all goes well I might start going over the current voice packs and updating them to the new types and mod versions in the next few weeks.


Here are some example comparisons in the meantime between the current pipeline and my new one. 

 

 

And here are just some more exported voice lines using Submissive Lola as an example. Remember these are all part of the model itself, no individual tweaking or adjustments can be done, which is important since covering voices for LoversLab type mods can go up to hundreds of thousands of lines easily:

 

 

 

 

 


Holy shit, that's amazing! How long does each sentence take to synthesize?

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...