Jump to content

Mantella + Skyrim can be quite an experience

Recommended Posts

If you have powerful enough machine or can afford a few bucks for e.g. Runpod.io, Mantella is a must have mod. You can find it here: https://www.nexusmods.com/skyrimspecialedition/mods/98631. In addition you need either Xtts or xvasynth, latter seems to be more convenient and faster


The idea is that it generates speech based on the character's bio and the current situation. Many characters, also added followers have already their bio added in a csv-file alongside with the name and the voice model of the character.


It gets even better when you:

1. Change the model to more.. exquisite one, like the https://huggingface.co/TheDrummer/Moistral-11B-v3-GGUF/resolve/main/Moistral-11B-v3-Q5_K_M.gguf 

2. Add some select texts to the initial prompt (in the config -file), like add a sentence: "The player is a sexy young woman with perfect breasts and perfect ass. You will randomly flirt or coerce her with lewd sexual talk."

3. Change the bio of the characters or add your own. For example, I added this bio for the Ciri follower (https://www.nexusmods.com/skyrimspecialedition/mods/5949😞

Ciri,Cirilla," Ciri, a warrior woman from distant land is tough and witty but also sexually totally perverted and deranged. Ciri will always suggest even the kinkiest acts.
Ciri is interested in anal play and would like to violate the Dragonborn with some bizarre tools.
Ciri is a skilled fighter and possesses unique magical powers as well"


I also copied the Ciri voice model from Witcher 3 to Skyrim, named it sk_cirilla and boom! Ciri is quite a slut in the game. 


On the technical note, I couldn't get the solution running with runpod.io:s OogaBooga, but it works well with the Kobold.

Link to comment

I am not criticizing you or the mod you're promoting, this is just a comment about AI.


When AI understands irony and sarcasm, I'll be satisfied with AI. AI will never, in my opinion, understand irony and sarcasm (if there are people who don't understand irony and sarcasm, what chance do computers have).


That said, this kind of thing is a start and enhances the illusion of not playing alone. So here's hoping it just keeps getting more and more easy to use.



Link to comment

Installed yesterday, and I've had several CTD during conversations even in "quiet" areas like a tavern. I'll try again in a really quiet area like a player home, and add in the "no player greetings" to see if that helps.

While I search for a more stable with less CTD, I'd suggest using "end conversation" every 2-3 responses, just so that the conversation progress is saved. That does make it a bit more of have a conversation, end a conversation, go to an adventure, restart the conversation.

Despite having a 4070 graphics card, having STT and kobold seems to put enough strain on the graphics that the game derezed some of the improved textures I had downloaded; wandering around whiterun suddenly had some low res walls and roads. Defaulting to text input helped, the resolutions returned to normal. So other users may face a choice - voice input and skyrim default textures, or text input and upgraded visual textures. Which could explain the "clean install" of skyrim as a suggestion form the install guides - I'm going to guess that scripts make the system potentially more unstable.

For kobold language models, Erebus https://huggingface.co/KoboldAI/GPT-NeoX-20B-Erebus-GGML has included additional text sources, but is much longer for the processing, and also seems to cause even more CTD. Which given that's a 15Gb LLM combined with TTS combined with modded skyrim is hardly surprising.


Going for a quiet room plus no greetings plus a complex LLM was game breaking for CTD. I'll now reverse, and go for a simpler LLM, or consider outsourcing the kobold workload onto a cloud system. 

Link to comment
On 5/5/2024 at 10:17 PM, LordDragon said:

If you have powerful enough machine or can afford a few bucks for e.g. Runpod.io, Mantella is a must have mod. You can find it here: https://www.nexusmods.com/skyrimspecialedition/mods/98631. In addition you need either Xtts or xvasynth, latter seems to be more convenient and faster


The idea is that it generates speech based on the character's bio and the current situation. Many characters, also added followers have already their bio added in a csv-file alongside with the name and the voice model of the character.


It gets even better when you:

1. Change the model to more.. exquisite one, like the https://huggingface.co/TheDrummer/Moistral-11B-v3-GGUF/resolve/main/Moistral-11B-v3-Q5_K_M.gguf 

2. Add some select texts to the initial prompt (in the config -file), like add a sentence: "The player is a sexy young woman with perfect breasts and perfect ass. You will randomly flirt or coerce her with lewd sexual talk."

3. Change the bio of the characters or add your own. For example, I added this bio for the Ciri follower (https://www.nexusmods.com/skyrimspecialedition/mods/5949😞

Ciri,Cirilla," Ciri, a warrior woman from distant land is tough and witty but also sexually totally perverted and deranged. Ciri will always suggest even the kinkiest acts.
Ciri is interested in anal play and would like to violate the Dragonborn with some bizarre tools.
Ciri is a skilled fighter and possesses unique magical powers as well"


I also copied the Ciri voice model from Witcher 3 to Skyrim, named it sk_cirilla and boom! Ciri is quite a slut in the game. 


On the technical note, I couldn't get the solution running with runpod.io:s OogaBooga, but it works well with the Kobold.

How did you get NSFW to work for it?


ChatGPT doesn't allow it for me, is there a work around?

Link to comment
16 minutes ago, 83ilotzent said:

How did you get NSFW to work for it?


ChatGPT doesn't allow it for me, is there a work around?

He said it in the topic, he changed the model from ChatGPT to Moistral, in the link he posted on HuggingFace. I can't get into explaining how to do that since I don't have the mod myself but yeah I use Moistral too on other programs.

Link to comment

Go with the moistral model. The Erebus model seems to be overtrained.


From the Mantella install guide, there's the link to kobold. 


Local Models

  1. Install koboldcpp’s latest release from here. If you want to run koboldcpp on your CPU or otherwise do not have an NVIDIA GPU, download koboldcpp_nocuda.exe under “Assets”. If you have an NVIDIA GPU with CUDA support, download koboldcpp.exe under “Assets”.
    Kobold Download Files

  2. Download a local model, such as toppy-m-7b.Q4_K_S.gguf from here.
    Toppy Download Location

  3. Run koboldcpp.exe. When presented with the launch window, drag the “Context Size” slider to 4096. Click the “Browse” button next to the “Model” field and select the model you downloaded. Click “Launch” in the bottom right corner.
    Kobold Launch Window


Under the “Presets” drop down at the top, choose either Use CLBlast, or Use CuBlas (if using Cuda). You will then see a field for GPU Layers. If you want to use CPU only leave it at 0. If you want to use your GPU, you can experiment with how many “layers” to offload to your GPU based on your system.


Make sure koboldcpp is running when Mantella is running!

Link to comment

Running Kobold on a local machine, and doing text input, it's taking about 2 minutes for kobold to do a response to my player.

Try to keep conversations to quiet areas, minimum NPCs, and don't move. My GPU is hitting 90%, the CPU spiking to 75%, and that's without doing speech to text.

Plan that you do some adventuring, and then have a conversation with an NPC. Save the game after the conversation has ended, mostly for continuity of any inventory/status changes, and then load the game and go adventuring again. If I've had a conversation and just try playing on to another dungeon, then it can feel stuttery.


Definitely follow LordDragon's advice about adding some text about yourself, and plan that your first half hour is probably setting up background stories for the NPC and for yourself. Consider setting the timescale inside skyrim to more like 2 rather than the default, otherwise a conversation will take a day.


Check the Mantella/data/Skyrim/conversations folder, and open up the text files. I've seen kobold get confused between talking as the NPC, and talking as the profession (mage, thief, warrior) of the NPC, and talking as an AI assistant. As kobold/mantella use the conversation .txt files as the NPC memory, if that gets confused then the conversation may go strange. Or you can edit in specific ... details ... of interactions.


Mantella does record equipping items and combat, but I've yet to see Mantella do anything with that info.

Link to comment

A good suggestion is to run koboldcpp on a second computer. I'm running 8gb models with this method using a rtx 4070 ti super. My response time in koboldcpp is near instant. But because of xVAsynth it takes 2-3 seconds in game before they reply, which is okay.

Link to comment
14 hours ago, Thor2000 said:

A good suggestion is to run koboldcpp on a second computer. I'm running 8gb models with this method using a rtx 4070 ti super. My response time in koboldcpp is near instant. But because of xVAsynth it takes 2-3 seconds in game before they reply, which is okay.

out of interest, the second pc just has to be on the same (home) network? or are there specific steps to get mantella to link to kobold on a different pc within the home network?

not that I happen to have a second computer available. I did find a second monitor so I can see kobold's progress while standing around in skyrim.


11 hours ago, 83ilotzent said:

holy shit looks like I wont be able to run this with a 1060 6gb lol



anyway for future references for other people this is a a really good post thanks guys!

you may be able to run kobold on cpu, and I think skyrim is mostly single thread. I'd really avoid speech to text, and keep to text input. With the 2 minute lag I've had, a four hour "chat" was still interesting. Just don't try to have a chat in the wilds with a "look there's some bandits, sneak attack", it'll be a reminisce in a tavern. Also, once you've set up an initial conversation, you'll get a summary text that you could paste into a kobold session outside of mantella.


from the 4 hour chat, the other advice is to use the *event happened* rather than "player describing event", and use names (NPC name, and PC name) to try to keep things clear as to who is doing what.

Link to comment
9 hours ago, JimUpdating said:

out of interest, the second pc just has to be on the same (home) network? or are there specific steps to get mantella to link to kobold on a different pc within the home network?

not that I happen to have a second computer available. I did find a second monitor so I can see kobold's progress while standing around in skyrim.

I've got both computers on the same network using cable. To get it to work you just have to change the IP adress part in the mantella config.ini file to match the second computer.


I guess this could be a cheap option since buying a second computer to host the game is pretty cheap these days. I'm using a gtx 1080 for the Skyrim game computer.

Link to comment
10 hours ago, JimUpdating said:

you may be able to run kobold on cpu, and I think skyrim is mostly single thread. I'd really avoid speech to text, and keep to text input.

Yeah even my CPU wouldn't be able to hold it (Ryzen 3 2200g),

and besides I'm playing in VR - so speech to text was the whole reason for getting Mantella in the first place.


Thanks for the help tho,

I guess I have to save up for a better rig

Link to comment

Referring to the discussion of NSFW, I need to elaborate a bit. You actually need two things:

  •  A language model like Moistral which does not run like a saint and, in the case of Moistral, is very tuned towards lewd text
  • In addition, you should alter the prompt given to "initialize" the context where the text generation works. For example for Skyrim SE/AE, in the config.ini -file, there is a skyrim_prompt -section, which starts like "skyrim_prompt = You are {name}, and you live in Skyrim. This is your background:\n\n{bio}\n\n". This prompt is given to the language model each time a discussion is initialized. What I did was to edit this description. I added text: "You often add some lewd content to your speech." and "The player is a sexy young woman with perfect breasts and perfect ass. You will randomly flirt or coerce her with lewd sexual talk.". 


As a bonus, as mentioned, you can alter the bio of the characters and make them masochists or animal lovers or what ever you like. With this setting and a selected model, it is going to be a wild ride. 


I actually created a small patch for Apropos 2 to output the descriptions to the "world events"-file used by Mantella. This way your lewd acts can also be a source of discussion, furthering the depravity. 


At some point and with some effort there could be triggers for Sexlab or e.g. for Devious Devices (as the concept exists with Aggro already).

Link to comment

I was going to suggest that apropos2 would make an interesting text feed. Otherwise I haven't seen much of world events filter through, other than location (shop, city, tavern).


Would it be possible to get a copy of the patch?

Link to comment
On 5/8/2024 at 7:57 AM, Thor2000 said:

I'm running 8gb models with this method using a rtx 4070 ti super.


That escalated quickly... given that your original plan was something like a 3060 🙃


4 hours ago, LordDragon said:

At some point and with some effort there could be triggers for Sexlab or e.g. for Devious Devices


There already is a fork called Pantella which is using an internal behavior system to trigger certain game actions depending on keywords in the response of the AI.

Getting it to trigger basically any papyrus function is quite easy from the code perspective.

Unfortunately its performance is quite poor, likely, in part, due to the fact you have to run it from source, and getting the AI to respond with the action keywords is... not frequent enough to be deemed satisfyling.

Link to comment
23 hours ago, Vader666 said:


That escalated quickly... given that your original plan was something like a 3060 🙃

I found out I was worth it 😆


No seriously, I decided I wanted  a16gb gpu. The second hand 3080s gpus was 70% of the retail price of a new 4070TI S. Then the choice was easy.

Link to comment
  • 2 weeks later...

 Was wondering when something like this would show up.

Big question is - whats with performance? I do run SE on a high end laptop, but from this topic i gather that you might need two powerhouse desktops to run it.

So, is there a way to limit its resource cosumtion to a point where its feasible on laptop hardware?

Link to comment
31 minutes ago, nilead said:

Big question is - whats with performance?


Depends on wether you want to run the AI locally or use cloud services.

When running locally u should have a 16GB VRam GPU, 12 is kind of minimum for 7B LLM + Game on a single PC.


Link to comment

try a free cloud + SFW version first, see if that adds enough variation to be of interest. The loss of Fuz Ro D'oh could break some other quests, or push to install large numbers of voicepacks ( which are mostly in english so may still break some player gameplay). 


if running local, once you've said or typed something, hit ESC and go to skyrim's system menu for 10 seconds, which will freeze the game and free up more resources., and let kobold process your text quicker. Then switch back to get the voiced response. Adjust the 10s - if you start missing their first sentence, congrats you have a fast system and should shorten the time.


you could set up mantella's config file to have a cloud service for faster SFW conversations to set up rapport with the NPC, and then comment out the cloud service and change to a local kobold NSFW conversation. You can edit the mantella intro and character descriptions between mantella sessions, so could do the switch from SFW to NSFW. However, as the intro, character description, and conversation summary are shared with the LLM when a conversation restarts, you'd want to be careful that you don't switch back to the cloud service and send the NSFW to a monitored cloud system.


if what is wanted is a voice to NSFW LLM to voice, then doing so with the computational overhead of (heavily modded) skyrim that doesn't respond to the conversation ("Lydia, attack the bandits!" isn't going to do anything on screen) is a bit of a heavy route to do this. An alternative route may be to have local kobold try to generate inputs for local stable diffusion to generate pictures as the conversation progresses. Wait two years and we'll be getting 30s looped animations linked to the NSFW chat.



Link to comment
1 hour ago, JimUpdating said:

Wait two years and we'll be getting 30s looped animations linked to the NSFW chat.


Why wait when you could hook into the text response, do some filtering in papyrus, and run a SL scene from that soon™ ?

Link to comment
6 hours ago, Vader666 said:


Depends on wether you want to run the AI locally or use cloud services.

When running locally u should have a 16GB VRam GPU, 12 is kind of minimum for 7B LLM + Game on a single PC.


 I knew skimping on 4090 was going to bite me, but that was faster than expected.

That was about as concise and yet full answer as i could hope for. Thanks a bunch.

Link to comment
1 hour ago, nilead said:

I knew skimping on 4090 was going to bite me


Unless you have a need for a 4090 besides LLM inference while running a game, i'd advise against it.

For that money you could get a 4060/4070 16GB and about 2 years (depending on usage) of running a more "powerfull" LLM than the 4090 could take via cloud computing.

Link to comment
Posted (edited)
4 minutes ago, Vader666 said:


Unless you have a need for a 4090 besides LLM inference while running a game, i'd advise against it.

For that money you could get a 4060/4070 16GB and about 2 years (depending on usage) of running a more "powerfull" LLM than the 4090 could take via cloud computing.

 Thats the thing, clouds are mostly not an option for me atm. And for laptops, the only way you get 16gb is 4090( Felt like a waste to get that 15-20% performance over 4080, as it would cost over a grand extra.

Edited by nilead
Link to comment
41 minutes ago, nilead said:

Felt like a waste to get that 15-20% performance over 4080


So you basically got a 4070 (desktop wise), which should do good enough unless you run a lot of 2k textures.

But there is only one way to find out, worst thing that could happen would be too much response delay.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue. For more information, see our Privacy Policy & Terms of Use