Jump to content

xVASynth / xVATrainer / xVADict


Recommended Posts

xVATrainer and xVADict further below in main page

 

xVASynth

An AI Voice Acting Synthesize Tool

 

xVASynth is an AI-based speech synthesis app that can generate high quality voice acting lines using existing voices from video games. The tool should enable a variety of new content, such as adding voice acting to game quest mods, creating machinima, adding voice acting to unvoiced areas of the games, and expanding existing dialogue with new vocabulary.

The tool and the 400+ voices across 35+ games can be publicly downloaded from the respective pages on the Nexus, or from his Patreon.

 

xVASynth is an AI app that generates voice acting lines using specific voices from video games. It can do text-to-speech (TTS) from text input, or speech-to-speech (S2S) from audio input. The app uses FastPitch [1,2] models, which give users artistic control over pitch, duration, and energy values for every letter in the audio. They also allow generating audio with explicitly defined pronunciation via ARPAbet [3] notation.

 

Additional contents for xVASynth in the below spoiler.

 

Spoiler

 

 

 

 

With this tool, game developers can recreate the voices of Nords, Bretons and certain custom characters within their Elder Scrolls games; Morrowind, Oblivion and Skyrim supported.  Both Fallout 3. Fallout New Vegas and Fallout 4 has their own set of voices, and one can even acquire voices such as Triss from The Witcher 3 and D.va from Overwatch.

The tool does not re-distribute any game assets, nor does it interact with them in any way. Game assets are used only during voice training as a reference, to guide the algorithm to drive itself to a point where it can create voices that sound similar enough to the examples. Think about it as an automated digital impersonator. Regardless, avoid using the tool in an offensive/explicit manner. Make it obvious where you can, in descriptions that the voice samples are generated, and are not from real human voice actors. Any issues you cause with this are on you.

 Version 2.1 was released on the 2nd of January, 2021, and matching 2.1.1 (s2s hotfix) patch likewise the same day.   Then on the 23rd of May, 2023 appeared version 3.0 followed by the 3.0.2 patch on the 16th of July, 2023.   xVASynth may be found within the Skyrim page at Nexus Mods.

 There are other games being supported, though these are found only at his Patreon page for now (Final Fantasy, Borderlands, Bioshock, GTA 4, GTA 5, GTA San Andreas, Resident Evil, Red Dead Redemptio 2, Command and Conquer, and others).

 And for those wondering about quality, I created a little sample using the Oblivion:Argonian voice set.

 Your momma must be proud whore.wav

 
or... a sample produced with only generated voices.  No actual human actor or voice was used

 

INCLUDED:

 

Voice Conversion (v3+)
The app can also do voice conversion, rather than text-to-speech. In this mode, you can provide a reference audio dialogue file, and the app will re-generate it but with the voice of the v3 model you select. You can provide a reference audio line by recording with your microphone (by clicking the icon), or you can drag+drop an audio file onto the icon. If needed (unlikely), you can control the voice conversion strength in the settings.

ARPAbet pronunciation (v2+)
You can specify exact pronunciation for words by using ARPAbet notation between { } brackets in the input, or by managing words in your own (or other people's) dictionaries. Included is CMUdict with 135k words with American-English pronunciations. NOTE: v3 introduces several new ARPAbet symbols, for a custom extended version of the ARPAbet spec which includes sounds more typically found in other languages.

 


OTHER LINKS:

 

xVADict community project - Elder Scrolls edition: https://www.nexusmods.com/skyrimspecialedition/mods/56778
xVADict is a community project to create ARPAbet pronunciation dictionaries, for use in xVASynth. This page contains the dictionary for the unique words found across all Elder Scrolls games.

xVADict - Alphabet Pronunciation: https://www.nexusmods.com/skyrimspecialedition/mods/57439
Adds the English alphabet pronunciation to xVASynth.

 

Voiced Player - xVASynth Fuz Ro Bork plugin: https://www.nexusmods.com/skyrimspecialedition/mods/62944
A plugin to connect xVASynth up to Fuz Ro Bork, enabling xVASynth voices to be used in the Fuz Ro Bork mod.


.lip and .fuz plugin for xVASynth v2:    https://www.nexusmods.com/skyrimspecialedition/mods/55605
A plugin to create .lip and (optionally) .fuz files automatically from audio lines generated with xVASynth, in either normal mode or batch mode, with or without multi-threading. DOES NOT NEED THE CK. Works for Skyrim, Fallout 4, Fallout 3, and Fallout New Vegas.

xVASynth plugin - Romanian Language:    https://www.nexusmods.com/skyrimspecialedition/mods/50878 
A demo plugin for v1.4.0+ of xVASynth, where third party plugins are now supported. This plugin changes the app front-end, swapping the UI language to Romanian. Full developer reference: https://github.com/DanRuta/xVA-Synth/wiki/Plugins.


If you are a developer and are interested in developing a plugin, check out the documentation here: https://github.com/DanRuta/xVA-Synth/wiki/Plugins

 

 

 

Below, you will find links to individual links to Nexus Mods that suit the taste of your game. 

 

 

And apparently, Starfield (SFVASynth) is coming soon

 

 

♦       ♦       ♦       ♦       ♦

 

xVATrainer

 

xVATrainer is the companion app to xVASynth, the AI text-to-speech app using video game voices. xVATrainer is used for creating the voice models for xVASynth, and for curating and pre-processing the datasets used for training these models. With this tool, you can provide new voices for mod authors to use in their projects.

 

(LINK AVAILABLE IN NEXUS MODS SKYRIM BOARD)

 

Additional contents for xVATrainer in the below spoiler.

Spoiler


There are three main components to xVATrainer:

  • Dataset annotation - where you can adjust the text transcripts of existing/finished datasets, or record new data for it over your microphone
  • Data preparation/pre-processing tools - Used for creating datasets of the correct format, from whatever audio data you may have
  • Model training - The bit where the models actually train on the datasets

 

65022-1647359620-2078109762.webp


The main screen of xVATrainer contains a dataset explorer, which gives you an easy way to view, analyse, and adjust the data samples in your dataset. It further provides recording capabilities, if you need to record a dataset of your own voice, straight through the app, into the correct format.

 

Tools

65022-1647359625-1903935270.webp

There are several data pre-processing tools included in xVATrainer, to help you with almost any data preparation work you may need to do, to prepare your datasets for training. There is no step-by-step order that they need to be operated in, so long as your datasets end up as 22050Hz mono wav files of clean speech audio, up to about 10 seconds in length, with an associated transcript file with each audio file's transcript. Depending on what sources your data is from, you can pick which tools you need to use, to prepare your dataset to match that format. The included tools are:
 

  • Audio formatting - a tool to convert from most audio formats into the required 22050Hz mono .wav format
  • AI speaker diarization - an AI model that automatically extracts short slices of speech audio from otherwise longer audio samples (including feature length movie sized audio clips). The audio slices are additionally separated automatically into different individual speakers
  • AI source separation - an AI model that can remove background noise, music, and echo from an audio clip of speech
  • Audio Normalization - a tool which normalizes (EBU R128) audio to standard loudness 
  • WEM to OGG - a tool to convert from a common audio format found in game files, to a playable .ogg format. Use the "Audio formatting" tool to convert this to the required .wav format
  • Cluster speakers - a tool which uses an AI model to encode audio files, and then clusters them into a known or unknown number of clusters, either separating multiple speakers, or single-speaker audio styles
  • Speaker similarity search - a tool which encoders some query files, a larger corpus of audio files, and then re-orders the larger corpus according to each file's similarity to all the query files
  • Speaker cluster similarity search -  the same as the "Speaker similarity search" tool, but using clusters calculated via the "Cluster speakers" tool as data points in the corpus to sort
  • Transcribe - an AI model which automatically generates a text transcript for audio files
  • WER transcript evaluation - a tool which examines your dataset's transcript against one auto-generated via the "Transcribe" tool to check for quality. Useful when supplying your own transcript, and checking if there are any transcription errors. 
  • Remove background noise - a more traditional noise removal tool, which uses a clip of just noise as reference to remove from a larger corpus of audio which consistently has matching background noise
  • Silence Split - A simple tool which splits long audio clips based on configurable silence detection

 

Trainer

65022-1647359634-154975061.webp

xVATrainer contains AI model training, for the FastPitch1.1 (with a custom modified training set-up), and HiFi-GAN models (the xVASynth "v2" models). The training follows a multi-stage approach especially optimized for maximum transfer learning (fine-tuning) quality. The generated models are exported into the correct format required by xVASynth, ready to use for generating audio with.

Batch training is also supported, allowing you to queue up any number of datasets to train, with cross-session persistence. The training panel shows a cmd-like textual log of the training progress, a tensorboard-like visual graph for the most relevant metrics, and a task manager-like set of system resources graphs.

You don't need any programming or machine learning experience. The only required input is to start/pause/stop the training sessions, and everything within is automated.

The time to train a voice will vary, and depend mostly on your hardware, and a bit on dataset size/difficulty. Some rough timings that have been reported to me:
- RTX 3090 - somewhere between a day or two
- GTX 1080 - about 5 to 6 days
- A100 - somewhere between 12 hours and a day




Publishing voices

Once trained, you can do whatever you want with your models, within the limits of the license of the training data (eg, don't go using a model commercially, when the training data cannot be used in the same commercial setting).

If you are adding voices for a game not currently supported in xVASynth, you need to make some accompanying asset files for xVASynth to use. Check the ./resources/app/assets folder in xVASynth to check what these asset files look like. Each game has a background image, and a .json file, both with a base filename of the respective game ID. A new game will need a new game ID. Try to follow the same naming conventions for everything, so that your models/game don't stick out like a sore thumb!


Nexus integration for your game 

To add nexus integration for a new game you add, you need to add the nexus game ID number into the .json asset file, in the "nexusGamePageIDs" array. You can check the other files for examples. To get the code(s) (you can add multiple games to your json - eg skyrim le, skyrim se), you can (a) ask me for help, or (b), open xVASynth, press Ctrl+Shift+I, and enter the following into the Console tab: getAllNexusGameIDs("fallout").then(console.log)  - where "fallout" is an example search query, and will return a list of all the nexus pages for games with fallout in their title. Expand that list that gets printed into the console, to see all the ID numbers for all the nexus game pages you want to bind to your game/series.



FUTURE PLANS: This isn't finished yet, but given that both xVASynth and xVATrainer are available on Steam, something on my TODO list is Steam Workshop integration. Once that is up and running, you will hopefully be able to upload your model(s) to the workshop straight from xVATrainer, and people will be able to download them straight into xVASynth.


Ethical considerations

xVATrainer and xVASynth together function as audio generating/editing software, just as something like Photoshop is for the image domain. In the same style, you need to consider what is/isn't right, to use the software for.

Be considerate of people's wishes, when creating voice models, especially if you're looking to train voice models for aspiring voice actors. Check the license/re-distribution rights for whatever training data you are using. If one is not available, you should check with the owner/voice actor first, and decide on one if you are given permission. This is easier than you'd think. It's mostly just a choice between "CC BY" for no rules, or "CC BY-NC" for non-commercial use. You can check
here for more on creative commons licenses, though you can, of course, use whichever license works best. If the data is from a game, it will most likely be non-commercial.

Check
Microsoft's take on how to responsibly handle people's voices in artificial synthesis models. They eloquently cover the basics quite well, and you should use this as a good, general starting point, before you start discussing things with your voice actor.

 

 

 

 

♦       ♦       ♦       ♦       ♦

 

xVADict

Community Project

Elder Scrolls Edition

 

 

xxVADict is a community project to create ARPAbet pronunciation dictionaries, for use in xVASynth. This page contains the dictionary for the unique words found across all Elder Scrolls games..

 

(LINK AVAILABLE IN NEXUS MODS SKYRIM BOARD)

 

 

Additional contents for xVADict in the below spoiler.

Spoiler

 

Elder Scrolls contains many unique words, which may be difficult for a text-to-speech model to pronounce correctly. ARPAbet is a special notation for how words should be pronounced. ARPAbet dictionaries are a feature supported in xVASynth v2.0+, to make voices automatically use the correct pronunciation for the words in the dictionary.

There are a couple thousand unique words to cover for Elder Scrolls, that are not already in CMUDict, an English pronunciation dictionary with ~135k words already included with xVASynth. Not all words in this list are actually important (most will get skipped). The number of words finished and packaged up will grow over time, as more people contribute.

There are 360 words finished right now, as of v1.0.4.



Installation

The dictionary is a simple .json file, and you can install it by placing it in the <xVASynth.exe>/resources/app/arpabet/<here> folder. 


How to use

The words can be enabled/disabled/edited from the ARPAbet menu, in xVASynth. You can access this via the ae symbol at the top-right.

56778-1633711828-955322304.webp

 

 

All words that are ticked on will automatically get replaced by the app, when synthesizing a line with that word in it. 

 

 

 

Contributors, and how to help

Check out the xVASynth Discord server for instructions on how to help contribute to the project. Alternatively, if you notice an incorrect pronunciation, do let me know!

The contributors (apart from myself), in order of number of lines contributed are:


Fa'Rihr
youbetterwork
Rachel
anonymous

 

 

 

♦       ♦       ♦       ♦       ♦

 

 

LOVERSLAB DEVELOPMENT LINKS:

DoubleCheeseburger:    xVASynth-Based Mod Voicification

 

 

ALTERNATE NEXUX MODS DEVELOPMENT LINKS:

aragonit:    xVASynth - AI Voice Imperial  - Oblivion

cupcakeninja64:    Glenmoril and Unslaad xVASynth Voiced  - Skyrim SE

Sturmgewehrrrrrrrrrrr:    Glenmoril Voiced by xVASynth  - Skyrim SE

MaxMusti:  Child Voices for SKVA (English German French - Skyrim SE

Tessory:  Russian Language Voice Models - Skyrim SE

GFirestarfan12 Morika Vuk40:  Onean Voice Package - Skyrim SE

Stanseas: Protectron Voice Model - Fallout 3

LzJackson:    xVASynth - Cito Voice  - Fallout 4

F4llfield:  Nuka World's Dixie Voice - Fallout 4

MaxMusti:  Shawn Child Voice (NPCMShaun10) - Fallout 4

radbeetle:    More 76 Voices for xVASynth  - Fallout 76 (Multiple downloads)

aedenthorn: Pelican xVASynth - Stardew Valley

 

GoogleDocs:   Spreadsheet list of available voices including direct links

 

 

HIS LINKS:

DISCORD:  https://discord.gg/nv7c6E2TzV

PATREON: https://www.patreon.com/xvasynth (no longer giving information)

TWITTER:  @dan_ruta

 

Edited by LongDukDong
OTHER xVASynth resource maker links
Link to comment

ADDITIONAL NOTES OF INTEREST
 

ODD SOUNDING WORDS
While xVASynth is fairly sophisticated, some words may not generate proper spoken dialog.

 

The dialog may sound off or the program may be confused by the spelling. In some cases, a substitute may be required.  The word 'Rough' for example sounds more accurate if the word entered within the system is instead 'Ruff'. And words like 'Bastard' and 'Impostor' may sound more accurate if spelled 'Basterd' and 'Imposter'. There are other words to consider like Breast (Brest), Why (Whyy), Penis (Peenis), Muscles (mussles), their (theyr), and Baby (Baybee).

 

Abbreviations may need to be converted to full spellings.  Typically seen abbreviations like 'Mr.' and 'Mrs.' would of course need to be spelled out as Mister and Misses. And the less obvious 'ok' typically works only if spelled as okay.

 

Generating extremely short phrases, let alone single words, do not fare well. It may require that you generate a sentence or two, the desired word being a latter sentence in itself. As an example, use something akin to "Am I really here?  Ah.  I see." (with adequate spacing set in the editor) in order to generate "Ah." itself... after trimming the excess, undesired dialog.

 

 

 

OBLIVION

While the output for the files are in .wav format, the following .MP3 format is recommended:

Sample Rate:    44100 Hz

Format:             32-Bit Float

Bit Rate:            64 kbps constant speed

Min Length:       0.5 seconds or greater (required if generating .lip files)

 

Filenames, already generated by Oblivion at the time of dialog creation. Merely find the filename in the editor and rename the .MP3 formatted audio file to match.  Example below:

Within the Crime Topic, the ServeSentence / "Hope you rot, criminal scum." text has placement for two voices, male Imperial and female Imperial.  Both share the same filename, though within their own respective gender-based folders. (That's sexist!).

 

The pre-defined filename for each's spoken/audible dialog is defined as "Crime_ServeSentence_000281B8_1" within the Voice Filename window in the Edit Response screen.  IE, each has their own "Crime_ServeSentence_000281B8_1.mp3" file and subsequent "Crime_ServeSentence_000281B8_1.lip" file if you can generate them."

 

I cannot say about other systems, unless someone PMs the information to me. *HINT HINT*


 

Edited by LongDukDong
Added some odd phoenetic respelling examples.
Link to comment

SYSTEM PATCH UPDATE

 

As of Saturday, November 6, 2021, Dan Ruta released a patch to upgrade xVASynth to version 2.0.5.  It should be available within all of the available Nexus Mod threads, whether Oblivion, Skyrim, or even Fallout 76.  Oddly, the post in his patreon said it was 2.0.4.  Apparently, this was a mistype.

 

There are no patch notes. And those that already used the 2.0.2 patch need not worry about doing anything special. Just install the new patch in place.

Link to comment
  • 2 months later...

Said content is that of Dan Ruta, the tool's creator...

 

 

1 Year Anniversary of version 1.0

Spoiler

Well, it's now been an entire year since I first posted the initial full release of the app, on 11th Jan 2021!

 

Things have come a long way for xVASynth, from its early days in 2018 (YouTube showcase of v0.1). Since this time last year, we've reached:

  • Over 450 voices trained (not including re-trained models)
  • Support for 39 games and game series
  • Almost 2000 members on Discord 
  • Over 55 downstream mods made with xVASynth in the showcase list (the actual number is much higher, given I only add things people send me, and not mods from other platforms)
  • Countless YouTube videos with xVASynth voice acting
  • 271,711 total downloads on Nexus, 478,275 views, and 2591 endorsements
  • 4 small spin-off mods/plugins, with 2 more on the way
  • 148732 total YouTube views on xVASynth videos (v2.0 showcase video)

 

The app has had a bunch of changes and new features added, including:

  • Much better UI, internationalization, and keyboard navigation
  • Audio post-processing via ffmpeg, with controls for pitch, tempo, silence padding, format, and amplitude
  • HiFi-GAN vocoders, replacing big slow WaveGlow vocoders with near instant, high quality audio generation
  • Batch mode, with super-optimized multi-processed, batched mass voice generation
  • A third party plugins system, with developer reference for anyone to build with
  • Support for a third per-letter control vector for models: energy (intensity)
  • Added support for explicit pronunciation control between { } brackets via ARPAbet, and dictionary management for automated replacements 
  • A brand new, much improved upon pitch/durations/energy editor, along with much better audio player
  • A 3D voice embeddings visualiser for search and discoverability of voice similarity
  • Nexus integration, to enable automated search, download, and installation of voice models
  • Plugin to automatically generate .lip/.fuz files for Bethesda games
  • (in progress still, but basically finished) Plugin to enable real-time xVASynth based TTS in Skyrim via the Fuz ro Bork mod 
  • A speech-to-speech mode (which finally works...) for generating audio for a voice A, in the style of voice A or B in a reference audio file
  • Support for multiple voice variants to choose between
  • Many other, and smaller things

 

Alongside all this, I've optimized the crap out of the training scripts, and began incorporating it into a companion app, xVATrainer, which currently has the following:

  • A main menu to create/adjust transcripts for speech datasets, with recording capabilities 
  • [Tool] Audio formatting, to convert any audio to the format required for the deep learning models
  • [Tool] AI speaker diarization, which automatically extracts speech audio from a super long audio file (eg movie), and extracts it into speaker-separated folders of short audio clips
  • [Tool] AI source separation, which, given a noisy audio file (eg background music, sfx), outputs just the clean speech audio, to make it usable for training
  • [Tool] Audio normalization, to harmonize audio from multiple sources to the same consistent levels  
  • [Tool] WEM to OGG, to convert from a common format of game file to a previewable audio format
  • [Tool] Cluster speakers, to take a folder of tens/hundreds/thousands of audio files from multiple speakers, and automatically sort them into folders assigned to each speaker - or if used with only files from one speaker, separate into different "emotions" (speaking styles)
  • [Tool] Speaker similarity search, to sort a large corpus of audio files by similarity to a query set of audio files
  • [Tool] Speaker cluster similarity search - same as normal search, but where the corpus is cluster folders
  • [Tool] Automatically generate transcripts for speech audio files
  • [Tool] Remove background noise, to get studio-like silence (no background humming, hissing, fan noise, etc)

 

But more importantly, I am currently working on the training menu for xVATrainer, which given a dataset fully processed with some/all the above tools, can train up models for you. This bit is quite an undertaking of work, but it is coming!

1.webp

(The above contains test data, apart from the system resources graphs)

 

 

 

xVASynth on Steam

Spoiler

So let's get to that headline link. I've been mentioning for ages that I'm working on a better way to distribute the app and models, since even before the v2 release. That thing is xVASynth being listed on Steam! This has been an insanely slow process, that I started in September (!), with lots of back-and-forth with Valve, over whether they can legally do it, with lots of U-turn decisions, and convincing.

 

There are actually a lot of benefits to also hosting the app on Steam (it's not going away from the Nexus):

  • Much better distribution speeds, for downloading the main app (which is getting quite big now, at ~5GB), and updates. No more slow Nexus download speeds!
  • Much easier installation, less probable causes of issues
  • Automated installation of the Microsoft Redistributable C++ requirement, which does catch people out
  • Automated installations for updates - always up-to-date with the latest changes
  • Workshop support! And this is a big one (thank you radbeetle for the idea!). As people start training voices with xVATrainer, it might be hard to keep track of any uploads to the Nexus, or wherever, despite the Repo management menu in xVASynth. The Steam Workshop can be somewhere that people upload models to, complete with user ratings. This should also work for other things too, like ARPAbet dictionaries, game themes, and plugins. 
  • Probably some other steam community features that I haven't explored yet - I must admit I've never used the Steam community features

There's actually a lot more work that needs doing for the workshop integration, and I am still waiting for the final go-ahead for the Steam release - subject to some small tweaks I've already made around me having even mentioned that I have a Patreon, in the app ?. The upload process will likely go in xVATrainer anyway.

 

You can already search for it (and a few people are already using it), but due to Steam's rules, I have to leave it in for a minimum of 2 weeks from posting as "Coming soon" before I can actually make it publicly downloadable to everyone. So perhaps the current release date will change to 2 weeks later, when they reply, and if they're not fussy about any more details.

 

On that note, Steam made sure to stress to me to let people know about the release, so they add it to their wishlist - apparently this is something they use for their algorithms.

 

The last quirk to mention around this is that I couldn't include the asset files (the game art) into the build, so the app will download with no images in there. Everything still works the same, but just without the images, as is. We'll see if they have a problem with this, but I've hosted the images on a google drive folder here which I will add to, as game support expands. There's a link in the app to this, which you'll have to go to and download the images from and manually place them in the ./resources/app/assets folder yourself, as a one-time installation thing.

 

 

Future plans

Spoiler

Right now, my main priority is getting xVATrainer finished, so that people can train their own voice models. Of course, things will significantly change once this happens, and I will most likely be free to switch to mainly just a research/development role - though I will of course continue to also train voices.

 

The discord server/channels design will have to adapt to that, as it becomes much more decentralized/community driven. I have some plans for new stuff to add to it too, which should be fun, once I upgrade my raspberry pi server to something more capable.

 

Speaking of the research/development stuff, I've actually also made some great progress with v3 models (which I also had to prioritize, for non-xVASynth related reasons), which I'm very excited about. I will post updates on this when I'm closer to the final design/feature spec.

---

All-in-all, thank you for the insane support over the last year - I never expected this tool to get used as much as it has been so far! I really appreciate all the messages, the support here, the funny videos, the mods, and the community we've built.

Now let's see what the following year brings...

 

 

Edited by LongDukDong
Link to comment
  • 2 weeks later...

Said content is that of Dan Ruta, the tool's creator...

 

 

Interim batch of 12 voices re-trained to v 2

Post contents within the below spoiler:

Spoiler

While the poll was going, I spent some gpu time re-training some of the older v1 oices over to v2, in addition to the ones mentioned in the last post.  With this update, we're now at about 20% of the voices having been retrained to v2.  The training scripts are getting faster and faster, with every optimization and tweak.  Seems like the weeks spent on that are actually amounting to something!


Anyway, the full list of updated voices:

 

Fallout 4: Nate
Fallout 4: Nora
Cyberpunk: V (Female)
Skyrim: Karliah
Skyrim: MaleCommander
Skyrim: MaleDunmer
Skyrim: MaleOldGrumpy
Skyrim: MaleOldKindly
Skyrim: MaleOrc
Witcher: Yennefer
Persona: Kasumi
Persona: Ren

(Links at his Persona website.)

 

 

Edited by LongDukDong
Link to comment
23 minutes ago, fishburger67 said:

Any idea when xVATrainer will be released, even a beta version?

DanRuta has yet to announce even an ETA. I have been following its progress from updates he posts on Discord, and he has made alot of progress. It "seems" close. My estimation would be betwwen next month and June of this year.

Link to comment
  • 2 weeks later...

Said content is that of Dan Ruta, the tool's creator...

 

 

Batch #34 - 6 (5+1 bonus) new voices;

xVASynth now public on Steam

Post contents within the below spoiler:

Spoiler

The next voices are here!

 

Had a slight set-back. While integrating the training script into xVATrainer, I found some ways to optimize the training scripts, which also affects the quality (I think for the better - let me know!). This meant a few days of (necessary) re-calibration, but the scripts are now faster, which should hopefully mean future voices should come out a little bit more often.

 

Anyway, here's the voices, including a request:

 

(Links at his Patreon website.)

 

If you've missed it, the final Steam approvals came through, and xVASynth is now public on Steam! 

 

Going forward, this is the recommended way to download/use the app, as this allows automatic updates, with fast downloads. Later down the line, the added Steam workshop support should also make life even easier, with the voice hosting too.

Edited by LongDukDong
Link to comment

Said content is that of Dan Ruta, the tool's creator...

 

 

Fuz Ro Bork integration; Mega-poll #2

Post contents within the below spoiler:

Spoiler

 

This one's been a long time coming, but it's finally here (and finished). 

 

This is an xVASynth plugin to integrate xVASynth voices into the Fuz Ro Bork mod which adds real-time TTS to the player in Skyrim. This has been a joint effort with the FuzRoBork author to get it to work. Check the video out to see it in action.

 

The non-existent latency was done through the use of pre-cached files for several v2 voices, which ship with this initial release, though more could come in the future of course.

 

You can get the plugin from here: ( CLICK )

 

I have some requests I'm still working through, but I want to throw the next move (in terms of voice training) up to a poll. I can either bring on the next poll of brand new voices, or I can use the equivalent time of that batch to train up more of the existing v1 voices over to v2. The latter would help with the FuzRoBork mod, and another upcoming (smaller) mod that I'm working on. It would also mean less bad quality voices, but it would mean no new voices for a week or two.

 

Patreon doesn't let me do both a video and a poll in the same post, so let's use strawpoll for this one: ( CLICK )

 

 

Edited by LongDukDong
Link to comment
  • 2 weeks later...

Said content is that of Dan Ruta, the tool's creator...

 

 

Interim voices (5); First community voice actor model

Post contents within the below spoiler:

Spoiler

So the strawpoll in the last post apparently doesn't work (fab), but I got another poll going on the discord server, and the vote was overwhelmingly (17 for, 6 against) in favour of doing more voice re-training v1->v2, rather than a new set of voices. The re-training is underway for a good number of voices, but a couple are done already that I can share already, and there were also some novel requests.

 

The voices included are:
- GTA Vice City: Steve
- GTA Vice City: Victor
- Skyrim: MaleDarkElfCynical
- Morrowind: MaleDunmer

(Links from his Patreon website.)

 

I went to re-train the mw dunmer voice, but when I put the files in the app, it seems I hadn't actually trained this voice before, so this is actually a new one!

 

- - -

 

Also this batch, is the first voice model trained on the voice of a community modder/voice actor. I spoke with them first, of course, to ensure all permissions were resolved. The model is released with a CC BY-NC license (this will show up in the app in a future update), meaning it can't be used for commercial stuff - just keep it for modding/machinima/etc, to keep things fair.

 

The voice actor goes by the name HadToRegister, and the voice they recorded sounds like Arnold when he played the Terminator. As we begin including voices from the community (another is most likely on the way), I ask now that we ensure their uses are fair towards the voice actors. It is up to us as a community to not mess this up, and keep everyone happy, or else we will lose this kind of privilege.

 

- First Community Member Voice: HadToRegister

(Link from his Patreon website.)

 
Edited by LongDukDong
Link to comment
  • 2 weeks later...

Said content is that of Dan Ruta, the tool's creator...

 

 

Interim batch (14 Voices); Second community voice

actor model; xVATrainer nearing completion;

Hardware upgrade

Post contents within the below spoiler:

Spoiler

A fair bit to update on! Let's get into it.

 

First of all, I've been going through more v1->v2 voice re-training. The 13 re-trained voices are (all Skyrim):

 

(Links from his Patreon website.)

 

I will use these to update the FuzRoBork integration plugin with pre-cache files for these voices, in the next few days. 

 

I'll finish off a few more voices, and then I'll get the next poll going, for some brand new voices.

 

Next, we have our second community voice model, from a voice actress named Ellie Mars. As mentioned last time, please keep in mind fair use of the voice, to avoid any trouble. The voice model is licensed as CC BY-NC, which just means don't use it for commercial products.

 

  (Link from his Patreon website.)

 

--

 

If you've been active on the Discord server, you may be aware that I've recently made a lot of progress on xVATrainer, and it is now nearing completion. All the initial v1.0 features and components are in, save for some smaller non-critical bits and bobs that need polishing/finalizing. The app is usable, and any teething issues apart, pretty much good to go. In fact, the first ever voice trained through xVATrainer is done - the second community voice!

 

The first round of early testing and feedback has started, and I will be spending a bit more time polishing things up, along with implementing any feedback that I get. I'll post more about this soon, after which a slightly wider beta test period will start.

 

One annoying issue with the app, over the scripts, is that HiFi-GAN training is currently stuck on num_workers=0, because any higher makes it inexplicably quit() without errors after a deterministic number of training iterations. This means that training is running about half as quick as it could be, for that small stage of the training. If you're experienced with PyTorch and would want to help me find a fix for this, let me know!

 

--

 

Finally, I have some good news on the hardware side of things. I've been saving up the donations from Patreon, proceeds from a temporary second job (other than the phd), and some money of my own, and I recently significantly upgraded the hardware in my workstation, boosting CPU and GPU compute further, along with a crap-ton more RAM. Voice training has been faster since I finished setting things up, a very noticeable amount!

 

This would not have been possible were it not for the amazing support I've had from everyone here, and I can't thank everyone enough! Not only does the upgrade mean faster voice training, but it also means faster research, as I'll begin to set my sights on v3 models.


In other updates, we're at about ~4500 Steam activations, and ~1000 unique users! 

 

 

 

Edited by LongDukDong
Link to comment
  • 2 weeks later...

Said content is that of Dan Ruta, the tool's creator...

 

Interim batch of 12 re-trained voices

xVATrainer now in beta

Post contents within the below spoiler:

Spoiler

Some more voices are now ready, after further v2 re-training. The list:

 

- Fallout 4: Curie
- Fallout 4: FemaleEvenToned
- Fallout 4: FemaleOld
- Fallout 4: Piper
- GTA San Andreas: Carl (CJ)
- Skyrim: Delphine
- Skyrim: FemaleCommoner
- Skyrim: FemaleElfHaughty
- Skyrim: MaleBandit
- Skyrim: MaleCommoner
- Skyrim: MaleNordCommander
- Skyrim: Maven

(Links from his Patreon website.)

 

The priority is currently to cover the most popular voices, and the super old voices from pre-tacotron days, such that they're ancient history. The poll voices are already underway.

 

As a quick update on the xVATrainer front, the app is now finished! *
(* in terms of features I wanted in the initial v1.0 release)

The Steam page for the app is now mostly finished, and under review by Valve. I've set the release date tentatively as April 8th to give them plenty of time to review (as moving the release date is not the easiest process). 

 

However, now that the initial set of features has been finished, and I've spent some time polishing things, it's now down to squashing any bugs that might surface, mostly from having the app run on a PC other than mine. To start with, the beta is running with just a handful of people, but closer to the actual release, I will open up the beta a bit more widely.

 

In the meantime, I will work on the showcase video (which is required before the Steam page can go live and go through the minimum 2 week "coming soon" period before it can be launched).

 

1.webp

 

Don't forget that if you're planning on using xVATrainer, there is a hard requirement for CUDA, which also means an NVIDIA card. If you need to upgrade, aim to maximize the VRAM - the more, the better for training speeds and to some small degree, quality.

 

 

 

Edited by LongDukDong
Link to comment

Said content is that of Dan Ruta, the tool's creator...

 

 

One covid isolation later, away from the computer, the batch is done.

The voices from the poll:

- Cyberpunk 2077: Dakota
- Cyberpunk 2077: Gilean
- Cyberpunk 2077: Wakako
- Fallout New Vegas: Doc Mitchell
- Fallout New Vegas: Victor

 

Thanks to Bungles on Discord, we also have our first community-trained voice! Straight to nexus, we have:

- Fallout New Vegas: The King

For this particular voice, we spoke and decided to have it up alongside the rest of the voices on the xva nexus page (credited ofc), but note that you can of course do whatever you want with the voices you train.

---

On the Trainer side of things, the app itself has been getting a steady supply of patches, polishing things up, fixing issues, and adjusting things based on feedback. The Steam store page approval has passed, and now that I've just finished the (quite long) showcase video for it, the app build approval process has started.

I know of several models having been trained by people on Discord already, so I'm excited to see what the community will make when it's all fully ready to go ?

 

 

Batch #25 - 5 new voices;

1st community-trained voice! ;

xVATrainer at last approval stage

Post contents within the below spoiler:

Spoiler

One covid isolation later, away from the computer, the batch is done.

 

The voices from the poll:

- Cyberpunk 2077: Dakota
- Cyberpunk 2077: Gilean
- Cyberpunk 2077: Wakako
- Fallout New Vegas: Doc Mitchell
- Fallout New Vegas: Victor

(Links from his Patreon website.)

 

Thanks to Bungles on Discord, we also have our first community-trained voice! Straight to nexus, we have:

 

- Fallout New Vegas: The King

 

For this particular voice, we spoke and decided to have it up alongside the rest of the voices on the xva nexus page (credited ofc), but note that you can of course do whatever you want with the voices you train.

 

---

 

On the Trainer side of things, the app itself has been getting a steady supply of patches, polishing things up, fixing issues, and adjusting things based on feedback. The Steam store page approval has passed, and now that I've just finished the (quite long) showcase video for it, the app build approval process has started.

 

I know of several models having been trained by people on Discord already, so I'm excited to see what the community will make when it's all fully ready to go. ?

 

 

Edited by LongDukDong
Link to comment
  • 2 weeks later...

Said content is that of Dan Ruta, the tool's creator...

 

 

Interim batch - 10 new voices;
7 re-trained voices;
100 + Fallout 4 voices!
 Post contents within the below spoiler:
Spoiler

A pretty large interim batch this time, with a bunch of voices from requests, and some from v1->v2 re-training. 

 

The new voices:

 

- GTA Vice City: Phil
- Fallout 4: Proctor Teagan
- Fallout 4: Proctor Quinlan
- Fallout 4: Initiate Clark
- Fallout 4: Paladin Brandis
- Fallout 4: Knight Lucia
- Fallout 4: Scribe Neriah
- Fallout 4: Knight Captain Cade
- Fallout 4: Knight Rhys
- Fallout 4: Knight Gavil

(Links from his Patreon website.)

 

The re-trained voices:

 

- Fallout 4: Cait
- Fallout 4: FemaleRough
- Oblivion: FemaleImperials
- Oblivion: FemaleArgonians
- Oblivion: MaleBreton
- Fallout 4: FemaleBoston
- Fallout 4: Gen1Synth01

(Links from his Patreon website.)

 

---

 

Some of these had a bit lower than my comfortable minimum number of lines, but the rest came out alright! 

 

With this update, we have safely crossed the 100 voices mark for Fallout 4 (and there's the poll voices coming, still)! 

 

(From LongDukDong:   Snagging the three retrained OBLIVION voices NOW!  Always had some issues with them.)

Edited by LongDukDong
Me snagging oblivion voices now
Link to comment

The next batch is ready! The winning voices this time are:

- Fallout 4: Tina de luca
- Fallout 4: Minutemen Radio
- Fallout 4: Ironsides
- Fallout 4: PAM
- Cyberpunk 2077: Hanako Arasaka
- Cyberpunk 2077: Stanley


This is the last batch before the trainer is out, this Friday! If you haven't already, check out the previous post, where there's a few Steam keys still going, for early access.

Through recent patches, among other things, is the addition of a new tool: Silence splitting

This is similar to the splitting that happens in the speaker diarization tool, but it's based on a configurable detection of silences, rather than on detections of speech activity. If you have, eg, an audio book where sentences are always, say, 0.5s of silence apart, this is a great tool for getting those exact split points, and much more quickly than with the diarization tool. 

--

In the meantime, I've got some more v1 -> v2 voice re-trainings running.

Batch #26 - 6 new voices;

New tool for xVATrainer

Post contents within the below spoiler:

Spoiler

The next batch is ready! The winning voices this time are:

- Fallout 4: Tina de luca
- Fallout 4: Minutemen Radio
- Fallout 4: Ironsides
- Fallout 4: PAM
- Cyberpunk 2077: Hanako Arasaka
- Cyberpunk 2077: Stanley

(Links from his Patreon website.)


This is the last batch before the trainer is out, this Friday! If you haven't already, check out the previous post, where there's a few Steam keys still going, for early access.

 

Through recent patches, among other things, is the addition of a new tool: Silence splitting

 

This is similar to the splitting that happens in the speaker diarization tool, but it's based on a configurable detection of silences, rather than on detections of speech activity. If you have, eg, an audio book where sentences are always, say, 0.5s of silence apart, this is a great tool for getting those exact split points, and much more quickly than with the diarization tool. 

 

--

 

In the meantime, I've got some more v1 -> v2 voice re-trainings running.

 

(From LongDukDong: Why no new Oblivion vocals yet???)
Edited by LongDukDong
Link to comment

Main page now includes both xVASynth and xVATrainer content.

 

Said content is that of Dan Ruta, the tool's creator...

 

xVATrainer v 1.0 release!; 500+ voices

milestone ; The future

Post contents within the below spoiler:
Spoiler

The time has come!

 

After months of development and testing, xVATrainer is finally ready, and publicly available!

 

If you missed the previous post, you should check the showcase video on YouTube over here, for instructions/an overview:

https://www.youtube.com/watch?v=PXv_SeTWk2M 

 

The app can be downloaded from Steam, and this is the installation I recommend, as Steam can do automatic updates. You can also download (and/or hype up if you wish) the app from the Nexus over here, too: https://www.nexusmods.com/skyrimspecialedition/mods/65022 

 

---

 

In a good bit of timing, we've also very recently crossed the 500 voices mark! Five hundred voices!

 

With this many voices, I've mostly covered the majority of "main" voices from the games I'm familiar with, and although there are still a bunch left, we are indeed now starting to hit those smaller, less known/important voices.

 

Now that the trainer app is finished, and out, I'm sure that the range of games that can get voice model coverage can organically grow beyond what I myself am familiar with, which is very exciting! I will happily add game support into the app, whenever I'm made aware of any newly covered game, though game support can also be added by anyone, by including newly created asset files themselves (check the instructions on the nexus page, or ask me/in the discord). Do let me know if there's a game I should add native support for myself!

 

---

 

However, with the trainer now out, this does mean I am now freed up somewhat, to focus less on training voices for the app, and more on the research and development for the xVASynth and xVATrainer apps. I will still be training voices, and upgrading older voices, but going forward, I can definitely focus more on the rest of the work.

 

I mentioned on the Discord server every now and then that I am loosely working on plans for v3 models (and thus v3 of the xVASynth app). It is still super early days. Having been mainly busy with xVATrainer over the last couple/few months, I've only done some data collections, and brief research into this.

 

However, I do have it mostly mapped out in my head, for what a v3 model could look like in terms of design, and what it would be able to do, if successful. I don't want to commit to any of this, but the following are some of the things I'd like the v3 models to be able to have/do, and what I will be focusing my r+d efforts on:

 

 

  • Multi-lingual support

    So far, only the English language is supported in the current models. Adding support for other languages is not too much of a research effort. But rather, more of an engineering effort. The models are designed (and hardcoded against) a set number of symbols, to represent language. These symbols are each letter in the alphabet, as well as each of the ARPAbet phonemes. Adding support for a new language would mostly involve adding each unique letter in that language's alphabet into the list of symbols. However, to continue to support ARPAbet notation, the same thing has to happen for the phonemes of the language, also. Not all phonemes are used in English, meaning other languages contain phonemes not included in ARPAbet (which is for English). So the next item...

  • Extended ARPAbet phoneme dictionary: xVA-ARPAbet

    To allow ARPAbet notation for other languages, something I've already started exploring is the creation/spec of a new notation dictionary for phonemes. There's no point re-inventing the wheel here, so ARPAbet is a great starting point for this. But as mentioned, other languages contain phonemes not present in English, so I've been working on including these extra sounds into an extended version of ARPAbet, which I'll just call xVA-ARPAbet. I've not looked at many languages so far, but from the first few, I've accumulated maybe like 6 or 7 new phonemes. Once I've looked at enough languages, this new dictionary will be used to create CMUdict-like pronunciation dictionaries for as many languages as I can. This whole step will be quite a lot of work, and will take some time - any help with this from linguistic experts would be incredibly helpful+appreciated! But I need to do this first, and finish it for all languages I will add support for, because the symbols representation in the models will change with any change to the total number of symbols. I want to avoid this, 1) to avoid handling too many symbols representations in the app, and 2) because every time a change is made, we start from zero, with training models, because no fine-tuning can happen, due to incompatibilities between representations.

  • Accent control

    In a similar vein, something I'd love to add is accent control. This is different from just outright language support. Through code splicing, it is possible for a TTS model to vary the control over accent, at per-letter granularity. So what this means, is that you can control the % of an accent for a particular letter/word/clause, just like with the pitch/energy/duration values - thus adding a fourth control vector over your line. As an example, you could use an English-base voice, and then change a particular word, to give it, say, 70% Spanish accent. Or generate a German-base voice, and put the first few words at 50% Italian accent, and the last few words at 25% Mandarin accent. In theory, through something like this, you could train F4:Curie's voice as 50%English, 50%French (or something) as base accent, and then re-generate lines with 100% English accent to remove her French accent. Or change it to Italian, or Russian, etc.

  • Emotion control

    This would be another cool thing to have. But this one will be a bit more difficult. I have two ideas for how to achieve this, but I am leaning towards one where I will need to first create a dataset of a few different voice actors reading out a few sentences in a set number of different emotions (eg Angry, Tired, Sad, Happy, etc). If everything works out ok with the design I am imagining, this would be a fifth control vector, thus also allowing % control upon each letter (and thus also word/letter/phoneme). I will explore this further, and I will contact volunteers/voice actors if some initial experiments turn out promising.

  • "Proper", full voice conversion

    The Speech-to-Speech feature currently in the app is some hacky engineering implementation, based on speech decomposition, rather than conversion. The reason I went with that approach is that it meant the voice conversion could be EDITED, in the pitch/energy/durations/text editor. More traditional voice conversion systems can't do that. What you put in, is what you get out - if you wanted to change anything, you have to re-record your line. However, the quality is much more reliable in this more traditional voice conversion setup, compared to the de-composition method I came up with. I've already experimented with some models which can do it quite well, and adding in some "proper" voice conversion should in theory be quite easy, through a specific model add-on. Depending on how the multi-lingual support turns out, this may be an easy thing to add.

  • Much faster, "compiled" models

    Right now, the models I've posted are all just basic checkpoints, straight from the training scripts. I've "compressed" them a tiny bit, and removed bits from them that aren't used outside of training, but that's it. More "proper" ways of model publishing can include actual "compilation" of models, where their file size can be reduced further, but more importantly, their execution (inference) time can be drastically improved. There are a few different ways this can be done, and not all methods are compatible with all hardware, so how exactly this will be done is up in the air still, but it's definitely something I'd like to attempt. Though, this might mean trouble for plugins that hook to events kicked off halfway though the model execution.

  • Real-time voice conversion

    Fully dependent on both the model compilation, and the "proper" voice conversion points, something cool I'd like to try to implement is real-time use of these models to augment your microphone feed, in real-time. The research for how to implement this is super limited, but I have an idea for how to do it (mostly just engineering), and if the stars align, I will add it in.

  • Better quality!

    First, through discussions on the Discord server, we discovered that the v2 models can be less expressive than the v1 models. From some brief experiments, I've brought back SOME of the expressivity by lessening the guidance strength during training, to the alignment component of the models, but the lower the strength is, the more unstable the models are. However, this does show that the expressivity is caused by the alignment module. There are other alignment methods that can be used, however, and I have my eyes on a different one, which would also reduce the strain on the CPU, during training (Stage 1, if you've already used the trainer). Swapping out these modules should in theory improve the quality further. Second, another change I'd like to make which will help with the quality, is the fusion of the generative model, and the vocoder. In v2 models terms, the fusion of the FastPitch and HiFi-GAN models. Right now, some sources of error can be introduced by the disjoint representations between the two models. Kinda like in a game of chinese whispers, where some information is not properly transferred between the two components. Fine-tuning them together could help avoid a lot of the tinny/metallic/robotic noises/artefacts that you sometimes get.

  • Maybe more

    Research progresses all the time. There may be other things I'll add to this list, as the papers come out! Keep an eye out for update posts!

 

 

 

This isn't a TODO list, it's just a list of wishes. The model I've loosely designed should be able to hit all of these points, but I will focus on one thing at a time, and not all things may work out. If not, maybe some of the things will go into a v4 model - time will tell.

 

Something which is certain, is that all this will be an insane amount of work. And the research process will include hundreds of experiments, all taking up my (and my computer's) time. It will not be a quick process, but now that xVATrainer is out, I can mostly direct the compute time to research, rather than voice training - the more compute I can throw at the experiments, the faster they will go.

 

I am hyped though! Any number of these points would be cool to have, I think, and so I'm very excited to get started!

 

---

 

On the note of compute resources, the Patreon donations have been critical to success so far, with lots of hardware upgrades. The community support has been amazing, and I can't thank you enough! We'd still be on the initial crappy v1.0 models otherwise!

 

Of course, the trainer changes a few things, and there aren't really that many things I can offer in return for support any more. I've made a few tweaks to the tiers, to update things a bit - however, outside of those small things, I can mostly just thank you from the bottom of my heart for the support, and making all this possible!

 

Everything is now in the community's hands. I can't wait to see what people will create!

 

 
 
Edited by LongDukDong
Link to comment
  • LongDukDong changed the title to xVASynth / xVATrainer

Said content is that of Dan Ruta, the tool's creator...

 

Research Update #1

Post contents within the below spoiler:

Spoiler

xVATrainer has been out for about 2 weeks now, and I've seen quite a few people on the Discord server using and training voices with it. I've not kept track of how many community voices have been trained so far, but I know that I've created new asset files for quite a few new games/game series/other categories. Hopefully we'll see more and more models coming out, as time goes on, released publicly online.

 

I am starting this new series of posts, to document and give an insight into what I'm working on, for the future of xVASynth, mostly in terms of the v3 models described in the previous post. These posts will be a bit more spread out, as research takes time - somewhat resembling the progress updates I was posting for xVATrainer, on the server, but a bit more detailed.

 

I've also attached to this post a few new voice models I trained while working on the v1.0.5 xVATrainer and v2.2.0 xVASynth updates:

 

- Skyrim: Serana (re-trained with much better data, there were some transcript mistakes)
- Oblivion: Female Redguards (v1->v2)
- Borderlands: Zero (new, requested voice)

(Links from his Patreon website.)

 

A new poll will be out, soon.

 

 

 

For the research update, I will start with the first point on the list (check the last post, for reference), namely: Multi-lingual support. This work is pretty much jointly done with the next two points, "Extended ARPAbet phoneme dictionary: xVA-ARPAbet", and "Accent control", so I'll be referring to all 3 points, when I talk about multi-lingual support.

 

I've been working on data collection, on and off, for the last few months, to get ahead of this very slow process, and I've still got a fair bit of work to do, but I can now at least settle on an initial draft for the list of languages I think I can add support for. This is not necessarily final, I may add/remove languages, but currently, the list is as follows:

  • Arabic
  • Chinese (Mandarin)
  • Dutch
  • English (done! ??)
  • Finnish
  • French
  • German
  • Greek
  • Hindi
  • Hungarian
  • Italian
  • Japanese
  • Korean
  • Latin
  • Polish
  • Portuguese
  • Romanian
  • Russian
  • Spanish
  • Swedish
  • Turkish
  • Ukrainian

 

If I can count, that's 22 languages, including English. More languages are of course possible, but this is what I've settled on, for now. If all goes well, all these languages will be trainable in xVATrainer, and usable in EVERY model in xVASynth. By "every", I mean that every model will have been pre-trained on all languages' data, in multi-speaker, multi-language mode, and the information will still be present through the per-letter accent control - and/or by setting the base language, maintaining identity. We'll see how things work out.

 

Every language will need datasets for both male and female, as well as a pronunciation dictionary (covering as much of the language as possible), which I can convert into my newly designed xVA-ARPAbet (also in progress, as part of this process). I also would like all languages supported to have an accompanying ASR model available in xVATrainer, for auto-transcriptions. 

 

I will provide another update on things, when I'm close to being finished with the initial data collection process, though I've made lots of progress on it.

 

If anyone reading this has access to any transcribed clean speech data for any of these languages (except English), and would be willing to share it, please do let me know! I need as much data as I can possibly get my hands on. I already got all the easily findable, proper TTS datasets (and created a few of my own), so the best source for these is probably the actual games' data. So if you play a game in one of these languages, and are able and willing to extract audio/text, that I can use for research+development, let me know - it'll definitely help a lot with the quality of that respective language - especially if the language is more rarely supported in games. 

 

 
Edited by LongDukDong
Link to comment
  • LongDukDong changed the title to xVASynth / xVATrainer / xVADict

PAGE TITLE AND CONTENT BUMP

 

I just discovered one more tool that would make life easier when making voice files for games, and it was made with xVASynth in mind.  So... I decided to post it in the main page.

 

Oh, and I put most of the content in spoilers to make it easier on the eyes and easier to navigate.  I may do that with the older bumps containing Dan Ruta's updates.

 

The new tool is called xVADict.  It is a pronunciation dictionary that can be loaded into xVASynth and is based on Elder Scrolls vocabulary. For this, the dictionary uses ARPAbet, a special notation for how words may be pronounced, and it is a supported and used by xVASynth 2.0+.  Not every word is in the dictionary, but it intends to cover most words and will grow over time.

 

It is also a community project if anyone wishes to participate.

Edited by LongDukDong
Link to comment

Said content is that of Dan Ruta, the tool's creator...

Batch #37 - 5 new voices: Say My Name

 

Post contents within the below spoiler:

Spoiler

 

The next batch of voices is here! The winners were quite clear, and they are:

 

- Cyberpunk 2077: Lizzy Wizzy
- Cyberpunk 2077: Meredith Stout
- Fallout 4: Madison Li
- Fallout 4: Virgil
- Mass Effect: Mordin

 

I also updated the following two voices:

- GTA San Andreas: CJ
- Red Dead Redemption 2: Arthur

(Links from his Patreon website.)

 

I've updated the public doc we've got going now, with the list of publicly available voices from both myself, and the community (via xVATrainer): https://docs.google.com/spreadsheets/d/1FeclOeLjVJkmyJnq4HKZZO9hDG7C3dydRG7v9gjBT4Y/edit?usp=sharing 

 

It was a pretty good break from the research, to go back to training voices again, for a little bit. Things are coming along well, with the research. I'm working on several of the research points in parallel, so the first few updates may take a bit, but I've nearly got a number of good things to update on. 

 

Things are a bit slower for a little while now, as I'm going through a real tough time at work currently, aiming for a really stressful research conference deadline (NeurIPS), which will be quite an important milestone for my PhD. I'm pulling in way overtime to finish things in time for the deadline, just over 2 weeks from now, so my bandwidth/energy for additional research has been reduced somewhat, while this has been going on. ? Nevertheless, I'll have some good news in the next research update.

 

 

Also since the last update, I released a tiny mod called "Say my name" for Skyrim, that I actually made back in October, but never found the time to just make the release video for. But I took a few hours away from work, and decided to just do it, and to my surprise, it really did get picked up on, by a lot of people. There was quite a number of articles written on it (+ a million translations and duplicates). And a Tik Tok?

 

The mod itself is quite simple, it's a batch file with all the dialogue lines in Skyrim where people call you "Dragonborn", with that word replaced with ___, such that people can download the batch file, find+replace the ___ with a custom name, batch synth it, and place the output files into the game folder to have NPCs call you by your own custom name.

 

There's a demo video for it here:

https://www.youtube.com/watch?v=W9LS8QiV5fw

and by popular demand, and installation video: https://www.youtube.com/watch?v=JrU6cA2lk6Y

Check it out ?

 

 

 

 

 

♦       ♦       ♦

 

From LongDukDong,

The "SayMyName" Skyrim mod that DanRuta is referencing can be found at NexusMods (>HERE<).

Edited by LongDukDong
Link to comment
  • 1 month later...

Said content is that of Dan Ruta, the tool's creator...

Batch #38 - 7 (5+2) new voices

 

Post contents within the below spoiler:

Spoiler

 

The recent poll is now ready! This one took some time, as I also had 11 privately requested voices in the pipeline. But we also discovered that I had 2 additional voices that I never released - not sure what happened there, but I had two v1 voice models I somehow forgot about. Anyway, the poll voices are as follows:

 

- Cyberpunk 2077: Maiko
- Cyberpunk 2077: Rachel
- Fallout 3: Moriarty
- Fallout 3: Colonel Autumn
- Fallout 76: Paige

 

And the two additional voices are:
 

- The Witcher: Dandelion (no HiFi however)
- The Witcher: Regis

(Links from his Patreon website.)

 

We're currently sitting at about 44 community voices in the doc! Exciting to see the voices coming out! Always cool to see what people are working on. 

 

Keep an eye out on the next research update post, in the next few days! It will contain the first small "milestone" for v3 models.

 

 

 

Edited by LongDukDong
Link to comment

I've been using Breton male voices for this instance.  OR you can use the Morrowind Imperials for now.

 

The MBP setup assumes that most non-vanilla voices (ie no Argonians, Altmer, etc)  are of either the Ainmhi race's voice or the Chocolate Elves, neither of which fully exist.  I do know that the Ainmhi voice does exist insofar as attack sounds within MBP, those voices coming from Morrowind's Wood Elves... which have not yet been generated for xVASynth.   For the time being, I'm using d/VA from overwatch for the female Ainmhi right now.

 

Lord knows, I'm itchin' to see a Sheogorath Wes Johnson voice.  We got Haskill afterall.

Link to comment

Said content is that of Dan Ruta, the tool's creator...

Research Update #4 -2 more new languages supported!

 

(Not the original... I had to play, record, convert...)

 

 

Post contents within the below spoiler:

Spoiler

 

A quick mini-update. Things are coming along well for language support! Following the addition of Chinese, in the last post, I've now added in two more languages!

 

The new languages now supported are: Italian, and Romanian.

 

In the audio preview above, you can first hear in Italian: "Questa è una frase in italiano, generata da xVASynth" (This is a sentence in italian, generated by xvasynth), then you can hear in Romanian: "și aceasta este o propoziție în limba română, generată de xVASynth" (and this is a sentence in romanian, generated by xVASynth)

 

The romanian voice is from a romanian TTS dataset, the italian voice is italian Fallout4:Piper. Both were generated without a HiFi-GAN model, so equivalent to right after Stage 4 xVATrainer.

 

These languages didn't require any custom phonemes, nor special handling of the text, so were fairly easy to add into the codebase I wrote. I'm hoping the majority of the remaining voices will be the same, though a couple may still need some bespoke work.

 

There's also a layer of number->words pre-processing for these languages (eg romanian 500 -> cinci sute), that future languages will also have.

 

 

Edited by LongDukDong
Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue. For more information, see our Privacy Policy & Terms of Use