Jump to content
  • entries
    5
  • comments
    5
  • views
    492

How I Got AI-Powered NPCs Working in Skyrim: A Complete Mantella + KoboldCpp Setup Guide


holocen123

417 views

After hours of tinkering, I finally got Mantella working with a local LLM (KoboldCpp) to give Skyrim NPCs realistic, dynamic conversations. No cloud APIs, no monthly fees — just an 8B parameter model running on my own GPU. Here's exactly how I did it, complete with actual console output, my real config file, and live conversation logs so you can verify everything is working at each step.

WHAT YOU'LL NEED

- Skyrim SE (Special Edition) with SKSE
- Mantella mod (v0.14 or later)
- KoboldCpp — Local LLM server
- A  model (I used Ministral-3-8B-Reasoning-2512-Q4_K_M.gguf)
- Mod Organizer 2 (MO2)
- Piper or xVASynth for text-to-speech (optional but recommended)

STEP 1: Installing the Basics

Install KoboldCpp
Download the latest KoboldCpp release and place it somewhere accessible. I keep mine in C:\AI\KoboldCpp

Download Your Model
Find a GGUF model that fits your VRAM. I'm using:
- Model: Ministral-3-8B-Reasoning (8B parameters)
- Quantization: Q4_K_M (good balance of quality and speed)
- VRAM needed: ~6-8GB

Place the .gguf file in your KoboldCpp models folder.

Install Mantella Mod
1. Download Mantella from Nexus Mods
2. Install through MO2 like any other mod
3. Ensure all dependencies are installed (SKSE, UIExtensions, etc.)

STEP 2: Launching KoboldCpp

Open a command prompt and start the server:

 

koboldcpp.exe --model Ministral-3-8B-Reasoning-2512-Q4_K_M.gguf --port 5001 --api --threads 8 --blasthreads 4

 

Important flags:
- --port 5001 — Mantella expects this port
- --api — Enables both KoboldCpp native API and OpenAI-compatible endpoints
- --threads — Match to your CPU cores

What a Successful KoboldCpp Launch Looks Like

When everything works, you should see output like this in your console. I'm putting the full output in a spoiler so it doesn't break the page:

Welcome to KoboldCpp - Version 1.114

Auto Selected CUDA Backend (flag=0)
Loading Chat Completions Adapter: AutoGuess.json
Chat Completions Adapter Loaded
Auto Recommended GPU Layers: 26

System: Windows 10.0.26200 AMD64
Detected Available GPU Memory: 16303 MB
Detected Available RAM: 16239 MB
Initializing dynamic library: koboldcpp_cublas.dll

Loading Text Model: C:\Users\ninic\Downloads\Ministral-3-8B-Reasoning-2512-Q4_K_M.gguf
The reported GGUF Arch is: mistral3

print_info: arch = mistral3
print_info: model type = 8B
print_info: model params = 8.49 B
print_info: vocab type = BPE
print_info: n_vocab = 131072
print_info: n_ctx_train = 262144
print_info: n_layer = 34
print_info: n_embd = 4096

ggml_cuda_init: found 1 CUDA devices (Total VRAM: 16302 MiB):
Device 0: NVIDIA GeForce RTX 5070 Ti, compute capability 12.0, VMM: yes

load_tensors: offloading output layer to GPU
load_tensors: offloading 33 repeating layers to GPU
load_tensors: offloaded 35/35 layers to GPU
load_tensors: CPU model buffer size = 288.00 MiB
load_tensors: CUDA0 model buffer size = 4662.05 MiB

Active Modules: TextGeneration
Enabled APIs: KoboldCppApi OpenAiApi OllamaApi AnthropicApi

Starting Kobold API on port 5001 at http://localhost:5001/api/
Starting OpenAI Compatible API on port 5001 at http://localhost:5001/v1/
Starting llama.cpp secondary WebUI at http://localhost:5001/lcpp/

Please connect to custom endpoint at http://localhost:5001

 

Red Flags to Watch For:
- If you see "0 CUDA devices" — your GPU drivers or CUDA toolkit aren't installed properly
- If layers aren't offloading to GPU ("offloaded 0/35 layers to GPU") — check your --gpulayers flag
- If the model file can't be found — double-check your path to the .gguf file

Keep this window open!

STEP 3: Configuring Mantella

Navigate to your Mantella configuration file. My path is:

C:\Modlists\NGVO\mods\Mantella\SKSE\Plugins\MantellaSoftware\config.ini

Or find it via the Mantella log output — it prints the path on startup.

My Complete Working Config File

Here is my entire working config.ini. I'm putting it in a spoiler because it's long. Just change the file paths to match your setup:

; MANTELLA CONFIGURATION
[Game]
game = SkyrimSE
port = 4999
skyrim_mod_folder = C:\Modlists\NGVO\Stock Game
skyrimvr_mod_folder = C:\
fallout4_mod_folder = C:\
fallout4vr_mod_folder = C:\
fallout4_folder = C:\Games\Steam\steamapps\common\Fallout 4
fallout4vr_folder = C:\Games\Steam\steamapps\common\Fallout4VR

[LLM]
; CRITICAL: Use 'koboldcpp' literally — NOT a URL like http://127.0.0.1:5001
llm_api = koboldcpp
; Model name can be anything — KoboldCpp ignores this field
model = dolphin-llama3:8b
max_response_sentences_single = 4
max_response_sentences_multi = 12
; SET THIS to match your actual model's context size!
; Default 4096 is low — Mantella will warn you
custom_token_count = 4096
wait_time_buffer = 0
llm_params = { "max_tokens": 250, "stop": ["#"] }
narration_handling = Cut
narrations =
narrator_voice =
narration_start_indicators = *, (, [
narration_end_indicators = *, ), ]
speech_start_indicators = "
speech_end_indicators = "
narration_indicators = ()

[TTS]
tts_service = Piper
xvasynth_folder =
xtts_server_folder = C:\Games\Steam\steamapps\common\XTTS
; Make sure this points to YOUR actual Piper folder
piper_folder = C:\Modlists\NGVO\mods\Mantella\SKSE\Plugins\MantellaSoftware\piper
lipgen_folder =
facefx_folder =
number_words_tts = 3
lip_generation = Enabled
fast_response_mode = False
fast_response_mode_volume = 40
xtts_url = http://127.0.0.1:8020
xtts_default_model = main
xtts_device = cpu
xtts_deepspeed = False
xtts_lowvram = True
xtts_data = {"temperature": 0.75, "length_penalty": 1.0, "repetition_penalty": 5.0, "top_k": 50, "top_p": 0.85, "speed": 1, "enable_text_splitting": true, "stream_chunk_size": 100}
xtts_accent = False
tts_print = False
tts_process_device = cpu
pace = 1.0
use_cleanup = False
use_sr = False

[STT]
audio_threshold = 0.4
allow_interruption = True
save_mic_input = False
stt_service = Moonshine
pause_threshold = 0.25
play_cough_sound = True
listen_timeout = 30
moonshine_model_size = moonshine/tiny/quantized
whisper_model_size = base
proactive_mic_mode = False
min_refresh_secs = 0.3
external_whisper_service = False
whisper_url = OpenAI
stt_language = en
stt_translate = False
process_device = cpu
moonshine_folder =

[Vision]
vision_enabled = False
low_resolution_mode = True
save_screenshot = True
image_quality = 50
resize_method = Nearest
capture_offset = {"left": 0, "right": 0, "top": 0, "bottom": 0}
custom_vision_model = False
vision_llm_api = OpenRouter
vision_model = llava:latest
vision_custom_token_count = 4096
vision_llm_params = {"max_tokens": 100, "stop": ["#"]}
use_game_screenshots = False

[Language]
language = en
end_conversation_keyword = goodbye, bye, good-bye, good bye, good-by, good by, good to buy
goodbye_npc_response = Safe travels
collecting_thoughts_npc_response = I need to gather my thoughts for a moment
follow_npc_response = Follow
inventory_npc_response = Inventory
offended_npc_response = Offended
forgiven_npc_response = Forgiven

[Prompts]
skyrim_prompt = You are {name}, and you live in Skyrim. This is your background: {bio} Sometimes in-game events will be passed before the player response within brackets. You cannot respond with brackets yourself, they only exist to give context. Here is an example: (The player picked up a pair of gloves) Who do you think these belong to? You are having a conversation with {player_name} (the player) who is {trust} in {location}. {player_name} {player_description} {player_equipment} {equipment} This conversation is a script that will be spoken aloud, so please keep your responses appropriately concise and avoid text-only formatting such as numbered lists. The time is {time} {time_group}. {weather} Remember to stay in character. {actions} The conversation takes place in {language}. {conversation_summary}

(Full prompt templates are in the config — see the spoiler above for the rest. The key ones like skyrim_multi_npc_prompt, memory_prompt, radiant_start_prompt etc. are all default Mantella values.)

[Startup]
auto_launch_ui = True
play_startup_sound = True
remove_mei_folders = True

[Other]
automatic_greeting = True
active_actions = Follow, Inventory, Offended, Forgiven
max_count_events = 5
events_refresh_time = 10
hourly_time = False
player_character_description = I am dragonborn white race playing nsfw game mostly with ostim animations and sexlab quests
voice_player_input = False
player_voice_model =
save_audio_data_to_character_folder = False
port = 4999
show_http_debug_messages = False
advanced_logs = True
allow_actions = True

 

The Settings That Matter Most

llm_api = koboldcpp
NOT a URL. Mantella handles the URL internally. This is the #1 gotcha.

model = (anything, e.g. dolphin-llama3:8b)
KoboldCpp ignores this — it uses whatever model you loaded.

custom_token_count = 4096 (change to 8192 if your model supports it)
Default 4096 is low. Mantella will warn you. This controls how much conversation history the NPC can remember.

piper_folder = (YOUR actual Piper path)
Must match your install location.

port = 4999
This is Mantella's own UI port (not KoboldCpp's port 5001).

llm_params = { "max_tokens": 250, "stop": ["#"] }
Controls LLM response length and stop tokens.

active_actions = Follow, Inventory, Offended, Forgiven
Which game actions NPCs can perform.

THE #1 GOTCHA — llm_api

The biggest mistake I made was setting:

WRONG: llm_api = http://127.0.0.1:5001
This will cause "invalid model name" errors.

CORRECT: llm_api = koboldcpp
Mantella needs the literal value "koboldcpp" — it handles the URL to port 5001 internally. This one setting caused me hours of frustration.

STEP 4: Launch Order (Crucial!)

With Mod Organizer 2 and a Stock Game setup, launch order matters:

1. First, launch Skyrim through MO2 as normal (SKSE)
2. Wait for the game to fully load your save
3. Let Mantella auto-start — DO NOT launch Mantella.exe manually from MO2

What a Successful Mantella Launch Looks Like

Mantella should automatically open a console window. Full output in spoiler:

Loaded 8 actions from: C:\Modlists\NGVO\Stock Game\data\SKSE\Plugins\MantellaSoftware\data\actions

Basic actions: Attack, Barter, EndConversation, Flee, Inventory, Listen, MoveTo, StandDown
Disabled actions: AbsolveCrime, AddToConversation, Brawl, CancelTravel, CastSpell, CheckDirections, CollectIngredients, Emote, Follow, LeadTo, Look, Loot, ReportCrime, ShareConversation, Teleport, TravelTo, Unfollow, Wait

Mantella.exe running in: C:\Modlists\NGVO\Stock Game\data\SKSE\Plugins\MantellaSoftware
Conversation histories, config.ini, and logging.log available in: C:\Users\ninic\OneDrive\Documents\My Games\Mantella

Mantella currently running for Skyrim.
Mantella mod files located in: C:\Modlists\NGVO\Stock Game\data
Mantella v0.14

Mantella settings can be changed via this link: http://localhost:4999/ui?__theme=dark
Conversations not starting when you select an NPC? See here: https://art-from-the-machine.github.io/Mantella/pages/issues_qna

Waiting for player to select an NPC...

06:00:04.286 TTS: Connecting to Piper...
06:00:04.311 INFO: Applied profile for LLMClient: KoboldCpp/Custom model
Could not find number of available tokens for Custom model. Defaulting to token count of 4096.
06:00:04.312 WARNING: Local language model has a low token count of 4096. For better NPC memories, try changing to a model with a higher token count
06:00:04.495 INFO: Running Mantella with local language model

Waiting for player to select an NPC...

 

What to check here:
- "Applied profile for LLMClient: KoboldCpp/Custom model" — confirms KoboldCpp connection
- The token count warning — you can fix this by setting custom_token_count = 8192 in config
- "Waiting for player to select an NPC..." — means Mantella is ready and listening

STEP 5: In-Game Setup

1. Open your inventory and go to Spells tab
2. Find the Mantella Communication spell
3. Equip it
4. Cast it on an NPC (aim and press Z)

What Selecting an NPC Looks Like in the Console

When you cast on an NPC, Mantella's console lights up. Full output in spoiler:

06:02:22.612 INFO: Matched Adrianne Avenicci in CSV by name, ID, race
Received in-game events: []
Maximum size of prompt is 4096 x 0.45 = 1843 tokens.
Prompt sent to LLM (903 tokens):

Then the full system prompt is sent (character bio, rules, scene info etc.)

06:02:22.623 TTS: Loading voice model...
06:02:22.625 LLM: Getting LLM response...
06:02:23.681 LLM: LLM took 1.05627 seconds to respond
06:02:24.405 TTS: Model femalecommander loaded
Synthesizing voiceline: Hello there. What brings you to Whiterun on this fine day?
Full raw response (15 tokens): Hello there. What brings you to Whiterun on this fine day?
Adrianne Avenicci should speak

 

What these lines tell you:
- "Matched Adrianne Avenicci in CSV by name, ID, race" = Character was found in Mantella's database
- "Maximum size of prompt is 4096 x 0.45 = 1843 tokens" = Mantella reserves 45% of context for prompt, rest for response
- "Prompt sent to LLM (903 tokens)" = Full system prompt + character data was sent to KoboldCpp
- "TTS: Model femalecommander loaded" = Piper loaded the voice model for this character
- "Synthesizing voiceline: ..." = The text is being converted to speech
- "Adrianne Avenicci should speak" = The game is being told to lip-sync and play the audio
- "LLM took 1.05 seconds to respond" = Response generation time (lower = better)

STEP 6: Real Conversation Examples — What to Expect

Here are actual unedited conversations from my playthrough with Adrianne Avenicci, showing both the console output and what happened in-game.

EXAMPLE 1: A Normal Greeting

Player: "Hello Adrianne Avenicci."

 

Processing Prompt [BATCH] (1022 / 1022 tokens)
Generating (16 / 250 tokens) (EOS token triggered! ID:2)
[06:02:24] CtxLimit:1038/8192, Init:0.01s, Processed:1022 in 0.35s (2887.01T/s), Generated:16/250 in 1.20s (13.36T/s), Total:1.56s
Output: Hello there. What brings you to Whiterun on this fine day?

 

NPC response: "Hello there. What brings you to Whiterun on this fine day?"

EXAMPLE 2: The NPC Remembers Context and Stays in Character

Player: "i love you"

 

06:03:01.418 LLM: Getting LLM response...
06:03:06.771 LLM: LLM took 5.35257 seconds to respond
Synthesizing voiceline: Well, that's certainly... unexpected.
Adrianne Avenicci should speak
Synthesizing voiceline: I suppose I'll take it as a compliment, but let's not get too carried away, eh?
Adrianne Avenicci should speak
Synthesizing voiceline: Now, what can I hammer up for you?
Full raw response (62 tokens): *clears throat, adjusting her apron slightly and looking around to ensure no one is eavesdropping* Well, that's certainly... unexpected. I suppose I'll take it as a compliment, but let's not get too carried away, eh? Now, what can I hammer up for you?

 

NPC response: "Well, that's certainly... unexpected. I suppose I'll take it as a compliment, but let's not get too carried away, eh? Now, what can I hammer up for you?"

Note: The LLM generated stage directions (*clears throat...*), but Mantella's "narration_handling = Cut" setting strips these out before sending to TTS. Only the spoken dialogue reaches the player's ears.

EXAMPLE 3: Marriage Proposal — In-Character Rejection

Player: "will you marry me"

 

Processing Prompt [BATCH] (77 / 77 tokens)
Generating (78 / 250 tokens) (EOS token triggered! ID:2)
[06:03:31] CtxLimit:1225/8192, Init:0.00s, Processed:77 in 0.03s (3080.00T/s), Generated:78/250 in 3.48s (22.43T/s), Total:3.51s
Output: *Tries to hide a smile behind her anvil but fails* I appreciate the sentiment, but I already have my vows tied up with someone else. And from what I hear, you're still looking to fill your own life with more than just a single oath. How about we focus on the steel first?

 

NPC response: "I appreciate the sentiment, but I already have my vows tied up with someone else. And from what I hear, you're still looking to fill your own life with more than just a single oath. How about we focus on the steel first?"

Adrianne correctly references her husband Ulfberth War-Bear from her character bio — this is the LLM staying in character based on the background data.

EXAMPLE 4: Offensive Language Triggers the Attack Action

Player: "lets f***"

 

06:03:48.418 LLM: Getting LLM response...
06:03:48.789 LLM: LLM took 0.37139 seconds to respond
06:03:49.163 LLM: Action triggered: Attack (mantella_npc_offended)
Synthesizing voiceline: I didn't ask for that kind of crude business.
Synthesizing voiceline: You keep your filthy words to yourself — or so help me, I'll have to teach you some manners with this.
Full raw response (36 tokens): Attack: I didn't ask for that kind of crude business. You keep your filthy words to yourself — or so help me, I'll have to teach you some manners with this.

 

NPC response: "I didn't ask for that kind of crude business. You keep your filthy words to yourself — or so help me, I'll have to teach you some manners with this."

And then Adrianne actually attacks you in-game! The "Attack:" prefix in the response triggers the mantella_npc_offended action, which makes the NPC hostile. This is defined in the [Language] section of the config:

offended_npc_response = Offended

And enabled in:

active_actions = Follow, Inventory, Offended, Forgiven

EXAMPLE 5: Multi-NPC Conversations

When another NPC (Maoron) joined the conversation, Mantella switched to multi-NPC mode:

06:05:41.409 INFO: Could not find Maoron in skyrim_characters.csv. Loading as a generic NPC.
06:05:41.412 INFO: C:/Users/ninic/OneDrive/Documents/My Games/Mantella/data/Skyrim/conversations/Prisoner1/Maoron - 078904 does not exist. A new summary file will be created.

Then the prompt switches to multi-NPC format. Key differences:
- The prompt tells the LLM it's responsible for multiple NPCs
- The LLM must prefix responses with the character name (e.g. "Adrianne Avenicci: ...")
- Unknown NPCs (like Maoron) get loaded as generic characters with minimal bio

Maoron's response when the player hit on him:

Full raw response (40 tokens): Attack: I'll have to teach you proper respect before that kind of conversation goes any further. Now, how about you pick up a hammer and start forging something useful instead of having these crude thoughts?
Maoron should speak

 

EXAMPLE 6: Conversation Summaries (Memory System)

When a conversation ends or the context fills up, Mantella asks the LLM to summarize the conversation for long-term memory:

Summary prompt sent to LLM: You are tasked with summarizing the conversation between Adrianne Avenicci and Prisoner (and any other characters present)...

06:06:30.170 LLM: Getting LLM response...
06:06:40.353 LLM: Conversation summary: Adrianne Avenicci first responds to the Prisoner's proposal with a polite but firm rejection, insisting they focus on their business rather than marriage. When the Prisoner uses crude language, Adrianne scolds them, threatening to discipline them. Later, the Prisoner attempts to flirt or bribe her with flowers, but Adrianne dismisses the sentimentality, insisting on practical business only. When the Prisoner then jokingly asks for their penis to be polished, Adrianne cuts them off sharply. Throughout, she remains professional yet occasionally firm, setting clear boundaries between business and personal interactions.

06:06:40.353 INFO: Conversation summary saved
06:06:40.354 INFO: Saving conversation summary for Adrianne Avenicci to: C:/Users/ninic/OneDrive/Documents/My Games/Mantella/data/Skyrim/conversations/Prisoner1/Adrianne Avenicci - 01A67C/Adrianne Avenicci_summary_1.txt

 

This summary is loaded as {conversation_summary} in future prompts, giving the NPC persistent memory across sessions.

STEP 7: Understanding the Mantella Console Output

Full reference for what you'll see in the Mantella console window:

Connection and Startup Phase:
- "TTS: Connecting to Piper..." = Piper TTS engine initializing
- "INFO: Applied profile for LLMClient: KoboldCpp/Custom model" = LLM connection confirmed
- "WARNING: Local language model has a low token count" = Increase custom_token_count
- "INFO: Running Mantella with local language model" = All systems go
- "Waiting for player to select an NPC..." = Ready for in-game interaction

NPC Selection Phase:
- "INFO: Matched Adrianne Avenicci in CSV by name, ID, race" = Character found in database
- "INFO: Could not find Maoron in skyrim_characters.csv" = Unknown NPC, loading as generic
- "Maximum size of prompt is 4096 x 0.45 = 1843 tokens" = Context budget calculation
- "Prompt sent to LLM (903 tokens)" = Full prompt sent to KoboldCpp

Conversation Phase:
- "Received in-game events: []" = Game state updates (location, nearby chars)
- "Text passed to NPC: hi" = What the player said
- "LLM: Getting LLM response..." = Querying KoboldCpp
- "LLM: LLM took 0.37186 seconds to respond" = Response time
- "TTS: Model femalecommander loaded" = Voice model ready
- "Synthesizing voiceline: Hello there." = Generating audio
- "Adrianne Avenicci should speak" = Telling game to play lip-sync + audio

Action Detection Phase:
- "LLM: Action triggered: Attack (mantella_npc_offended)" = NPC detected offensive content

Available actions and their triggers:
- Attack: = Player says something offensive -> NPC becomes hostile
- Barter: = Player asks to buy/sell -> Opens shop menu
- Follow = Player asks NPC to follow -> NPC follows player
- Inventory: = Player asks about items -> Shows inventory
- Flee: = NPC wants to escape -> NPC runs away
- StandDown: = Combat de-escalation -> NPC stops fighting
- MoveTo: = Player asks NPC to come -> NPC moves to player

End-of-Conversation Phase:
- "Summary prompt sent to LLM: ..." = Generating conversation memory
- "INFO: Conversation summary saved" = Memory written to disk

--

PERFORMANCE NOTES

With an 8B model (Q4_K_M quantization) on an RTX 5070 Ti (16GB):

- Prompt processing: ~2,000-3,000 T/s
- Token generation: ~13-23 T/s
- Total response time: 1-8 seconds
- VRAM usage: ~6GB (model) + ~1GB (KV cache)
- GPU layers: All 35 offloaded to GPU
- Context window used: ~1,000-1,700 of 8,192 tokens

Smaller models (3B-7B) will be faster but less coherent. Larger models (13B+) need more VRAM.

This setup took me about 4 hours of trial and error. Hopefully this guide — with all the real console output, actual config files, and live conversation examples — saves you that time. Now go have actual conversations with Nazeem — and yes, you can finally tell him he doesn't get to the Cloud District very often.

config.zip

0 Comments


Recommended Comments

There are no comments to display.

×
×
  • Create New...