MSM_Alice Posted July 21, 2025 Posted July 21, 2025 (edited) Hello! I have gotten to the point in my mod creation journey where I have started generating lip files and adding voices. Much to my dismay though, even though doing everything by the book (I think ) the lip motion on my actors is not matched with the audio (lip motion seems 2x slower or something along those lines). Everything else works perfectly, audio plays as it should, when it should, just that the generated fuz files from my wav+lip files, seem to have this latency/imprecision. Did anyone come across this? Where should I look, what are the common pitfalls? Using Suno Bark to generate the actual wave files. FaceFXWrapper 0.4-20061-0-4-1631087053.zip to generate lip files Batch Lip Generator GUI-71357-1-0-3-1688230687.zip, (together with the plugin from xVASynth v3.0.0) to batch generate lip files xVASynth v3.0.0 Main app-44184-3-0-0-1684850032.zip, and the lip_fuz plugin files for lip generation ( but for now I actually am generating all my source wav files externally, mono, 44100 kHz) YakitoriAudioConverter, to take the wav + lip files and make them into a fuz. Thank you for your attention and your help! Edited July 22, 2025 by MSM_Alice
JB. Posted July 21, 2025 Posted July 21, 2025 I used a similar system for a couple of years and the same thing happened. The lip files don't turn out well. I had no choice but to use XVASynth + this Batch Guide. Basically, you make the file with the dialogues with the CK as usual (you can manually delete rows that don't contain dialogue), use the guide to transform it into another format (CSV), and then load it into XvaSynth. I use the options to not output the fuz or wav formats. I just want the lip files. It's more cumbersome, yes, but it's a joy to see those lips move just right. In this context, Batch Lip Generator is no longer necessary. I still have it in case there are small or specific dialogues where I don't want perfect lip-syncing, since it's so fast. By the way, I use Unfuzer.exe to package WAVs and FUZs. It's faster. I also have Yakitori too, but only for exceptional cases where I need to extract WAVs from fuz files. Unfuzer is good for packaging, but not so good for extracting. WAVs often come out with artifacts. 1
MSM_Alice Posted July 22, 2025 Author Posted July 22, 2025 8 hours ago, JB. said: I used a similar system for a couple of years and the same thing happened. The lip files don't turn out well. I had no choice but to use XVASynth + this Batch Guide. Basically, you make the file with the dialogues with the CK as usual (you can manually delete rows that don't contain dialogue), use the guide to transform it into another format (CSV), and then load it into XvaSynth. I use the options to not output the fuz or wav formats. I just want the lip files. It's more cumbersome, yes, but it's a joy to see those lips move just right. In this context, Batch Lip Generator is no longer necessary. I still have it in case there are small or specific dialogues where I don't want perfect lip-syncing, since it's so fast. By the way, I use Unfuzer.exe to package WAVs and FUZs. It's faster. I also have Yakitori too, but only for exceptional cases where I need to extract WAVs from fuz files. Unfuzer is good for packaging, but not so good for extracting. WAVs often come out with artifacts. Thank you so much! Will try that.
MSM_Alice Posted July 22, 2025 Author Posted July 22, 2025 (edited) 8 hours ago, JB. said: I used a similar system for a couple of years and the same thing happened. The lip files don't turn out well. I had no choice but to use XVASynth + this Batch Guide. Basically, you make the file with the dialogues with the CK as usual (you can manually delete rows that don't contain dialogue), use the guide to transform it into another format (CSV), and then load it into XvaSynth. I use the options to not output the fuz or wav formats. I just want the lip files. It's more cumbersome, yes, but it's a joy to see those lips move just right. In this context, Batch Lip Generator is no longer necessary. I still have it in case there are small or specific dialogues where I don't want perfect lip-syncing, since it's so fast. By the way, I use Unfuzer.exe to package WAVs and FUZs. It's faster. I also have Yakitori too, but only for exceptional cases where I need to extract WAVs from fuz files. Unfuzer is good for packaging, but not so good for extracting. WAVs often come out with artifacts. If I am reading this correctly, this path only works for wav files generated within XVASynth. Is there any way to bringing in this pipeline external wav files? Or is it key for the entire process that they are generated within XVASynth. Edited July 22, 2025 by MSM_Alice
JB. Posted July 22, 2025 Posted July 22, 2025 (edited) 2 hours ago, MSM_Alice said: If I am reading this correctly, this path only works for wav files generated within XVASynth. No. I just need the CSV file. First, I extract Lindstrom's dialogues from Commownealth Slavers dialogue using CK. I recently made about 30 new lines for Lindstrom, so I deleted all the lines and left the newest ones. To sort them, remember to sort them by TimeStamp, with the newest lines at the top. Then I open XvaSynth, and in the options section, I go to this part and click convert. I have a folder where I throw these CK files, I point it at XVaySynth (check "Files Directory"), and it converts them. Conversions go to xVASynth\resources\app\plugins\f4_batch_prep 👇🏼 I go to xVaSynth and drop the converted file. Oh, right, I always use Kellogg's voice type to get the lip files. It's sort of universal. It works for both girls and boys. You need to download this in XVaSynth page. Now, synthetize batch. Check my settings. I only want lip files. Once obtained, open the containing folder. "Open Output". That's it. Put them in the same folder as your WAVs and let Unfuzer package them. I hope that's clear, I suck at explaining. Edited July 22, 2025 by JB. 1
MSM_Alice Posted July 22, 2025 Author Posted July 22, 2025 (edited) 1 hour ago, JB. said: No. I just need the CSV file. First, I extract Lindstrom's dialogues from Commownealth Slavers dialogue using CK. I recently made about 30 new lines for Lindstrom, so I deleted all the lines and left the newest ones. To sort them, remember to sort them by TimeStamp, with the newest lines at the top. Then I open XvaSynth, and in the options section, I go to this part and click convert. I have a folder where I throw these CK files, I point it at XVaySynth (check "Files Directory"), and it converts them. Conversions go to xVASynth\resources\app\plugins\f4_batch_prep 👇🏼 I go to xVaSynth and drop the converted file. Oh, right, I always use Kellogg's voice type to get the lip files. It's sort of universal. It works for both girls and boys. You need to download this in XVaSynth page. Now, synthetize batch. Check my settings. I only want lip files. Once obtained, open the containing folder. "Open Output". That's it. Put them in the same folder as your WAVs and let Unfuzer package them. I hope that's clear, I suck at explaining. Thank you so much for putting it together. That is a perfect ( and invaluable) step-by-step! It's just that I am surprised that ( if I understand correctly) the lip files are generated essentially by processes that have no real connection to the actual WAV files, just with the text line? (Or at most with a generated wav file from that text file) For instance, the lip animation for a " Hey, who are you" ? line It would be quite different time-wise if between the " hey" , and the " who are you parts" there's a longer pause, or a shorter pause. The actual length of that pause can't be figured out by looking at the text alone, or from a generated wav file based on that text alone. That info is only really in the original wav file. Unless there is an implication that all wav files, even if added separately, would still be generated with xVA synth, and because xVA Synth produces a similar cadence across the board is the input was indentical, then the cadence just fits, no matter which voice was used. OR Unless the lip files do not have any time keys in them per se, no embedded synchronicity in them, only the sequence of mouth shapes, and then that timing sync-up aspect is generated exclusively at the moment when the .lip file is merged with the wav file into the fuz file? Is that how things work? Edited July 22, 2025 by MSM_Alice
JB. Posted July 22, 2025 Posted July 22, 2025 I was as surprised as you. It's all text-based. I notice that it usually respects the ellipsis (not always). But if you add an artificial pause... it's a mess. One of my characters speaks quite quickly. Unfortunately, this process doesn't work so well with him. His lips don't always keep up. But, oh well, it's better than nothing. If you manage to perfect it, let me know. I also thought the WAV played an important role. Surely it has it. 1
MSM_Alice Posted July 22, 2025 Author Posted July 22, 2025 1 hour ago, JB. said: I was as surprised as you. It's all text-based. I notice that it usually respects the ellipsis (not always). But if you add an artificial pause... it's a mess. One of my characters speaks quite quickly. Unfortunately, this process doesn't work so well with him. His lips don't always keep up. But, oh well, it's better than nothing. If you manage to perfect it, let me know. I also thought the WAV played an important role. Surely it has it. A good clarification. Indeed, it would seem that, at least in this particular process, lip motion cadence is "hard-linked" to a preset cadence (probably the default XVASynth generation cadence) rather than any actual custom info inferred from any audio in the wav file itself. I have seen that in xVASynth, one can manually elongate certain vowels and such. I will test if the Lip files endup different , and take into account those elongations correctly. If yes, then there could be ways to make external wavs fit the generated lips, if one just manually edits the generated xVaSynth wav (with these elongations/compressions) to match the external wav, time-wise, via using these manual modifications/elongations in the xVASythn UI.
MSM_Alice Posted July 23, 2025 Author Posted July 23, 2025 (edited) I have my voice conclusions. In terms of voice all automated solution gives , xVASynth is usable, but "meh" results if all is on auto. The lip motions if generated exclusivelly form text are on point yes, but the sound itself often breaks immersion, with a few notable exceptions( like the player voice with generally turns out rather okay-ish) , it is giving the sensation that you are in a world of faulty synths with slightly defective voice boxes and a forced voice cadence. At time, it really takes one out of the immersion. I think the way to do it for high quality is either A. Spend a lot of time on each line in xVASyth fine tuning every aspect of the pronunciation and regenerating them until they sound passable. And then hope the text-based lip generator fits still fits, and your changes didn't make them diverge too much. B. For Externally generated wave files Generate xVASyth equivalents with the same input text prompts AND, line by line, manually adjust the XVASyth text input, padding and adjusting so that in the end the tempo and cadence of the xVaSynth wave file ( that we will not use the waveform of, only the lip file) matches 1-to-1 the cadence and timing of externally generated wave file. And then hope that the text-based generated lip file can be extracted and packed with the externally generated wav into a fuz, and pray it is okay. Having high quality (non Synth-sounding) voices with good lipsync will be a slow and tedious process, for each line of dialogue. Probably the most talkative characters will have to be bots, or people wearing helmets/masks/mouth covers. The fact that the player's voice itself works decently with xVASyths is a saving grace. I didn't test if maybe piping in the external wave files into the CK via a virtual microphone device, if that internal CK lip generating process would then be any better than the one done with the external tools. Edited July 23, 2025 by MSM_Alice
Reginald_001 Posted July 23, 2025 Posted July 23, 2025 To create lip files you download the 32bit CK (still through Steam) and use the basic command line function to create lips for all your recorded Wavs, whether they were batch generated or not. If your ESM is not compatible because you're developing in NG version, you can use this to downgrade the version number temporarily so it does load in the 32bit CK: https://www.nexusmods.com/fallout4/mods/42397 (You can also use Fo4Edit/Xedit to downgrade the version manually) Your target shortcut would be: "X:\Your steamfolder\steamapps\common\Fallout 4\CreationKit32.exe" -GenerateLips:YourEspFile.esp" And that's it. 1
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now