Any tips for how to get the bot to make shorter messages?

Sleevesies · June 20

I've been trying to figure this out on my own for a while now, so I use sillytavern, and koboldcpp to run a bot locally, and I've used a variety of GGUFs, this issue persisted through all of them

when I want my character to speak with short, basic, and concise messages, it just doesn't it always rambles and makes multiple paragraphs, when I've made it very clear that I don't want this, I even tried telling the bot to type shorter messages in the actual conversation itself and it couldn't figure that out

for context, this bot is supposed to behave similar to an AI assistant, so it's specifically only using raw text, it doesn't need to use italics or "quotation" text, so because of this lack of action description it should be easier to make shorter messages, but that doesn't seem to be the case

I'm at my wits end here, I've gone so far as to put {{char}} will not ever, for any reason, under any circumstances, write a message longer than 100 words, no matter what. in both the character description, and the system prompt and it still throws multiple paragraphs, often with more than 100 words each!

and you may be wondering why I don't just lower the response tokens, that's because using that to shorten messages feels like it causes too much cutoff, lots of the time with shorter response tokens the character doesn't get to finish a thought, or just stops in the middle of a word, it's not a natural conclusion to the sentence, so having an higher than desired response token limit but putting the limits in the prompts seems more useful, since that'll make it a more natural conclusion, sort of like how most roads that say the limit is 80mph actually just mean you should stay 75mph-85mph, it's not absurdly strict, just a "this speed-ish" guide

I want my bot to do one small-medium sized paragraph each message

Wonders of Eros · June 21

Yeah, I definitely agree with your sentiment. Most of the decensored models that people tend to host through SillyTavern will give some overly long responses from time to time, at least initially. Thankfully, many of them do adapt over time so if you continue a conversation long enough they will eventually adjust their response length. As for making it a consistent pattern that works right from the onset, I don't think there's much we can do right now, at least to my knowledge. As you pointed out, shortening the response tokens often leads to unsatisfying cut-offs so you have to manually edit the end of many messages just to make things flow well. One of the few things I still miss about FlowGPT was that the bots kept messages fairly concise - rarely more than one short paragraph, but the service was terrible and you couldn't have anything remotely close to the same context length (token count) as you can with some of the models that are regularly hosted through SIllyTavern. I used to use Behemoth 123B, which will generally give you short and fairly unrepetitive responses compared to most decensored models but Behemoth struggles to maintain lore consistency and it has an easier time forgetting things even when you give it a very high token count to work with. These days, I mostly just use decensored versions of Gemma (such as Gemma 4 31B it Heretic) with Behemoth as an occasional backup. Yes, Gemma models can also give some rather long initial responses, but they're highly adaptable, VERY good at remembering context and lore, and many of them can handle more than 20'000 tokens if you manually adjust the token limit in SillyTavern.

100 SillyTavern tokens is around 366~375 symbols, so let's say you've written around 20'000 symbols worth of text as background context for your scenario. That comes out to around 60 messages, so if you have a maximum context length of 20'000 tokens, that means you could still have like 140 messages each worth 366 symbols. For context, that's about the length of this paragraph.

You should try Gemma 4 31B it Heretic. It might at least make for a decent compromise while we wait for better models to appear.

Sleevesies · June 21

darn, guess it's just one of those limitations then, perhaps if I lower the temp value? I had it at 1-1.25 for the few most recent tests, but I should try using lower temp, even doing a huge underswing and going somewhere stupidly low like 0.1, just to see if it using less creativity will make it follow orders better? that's one theory for the next time I decide to mess with it at least...

thanks for the recommendations, I've been using Qwen2.5-7B-instruct-Uncensored.Q8_0 and Valkyrie-49B-v1a-Q2_K mostly, however I do want to find a version of the Maid Grand Horror series of GGUFs that's been floating around, I've tried a couple but they're all so unstable, even with the recommended text completion settings, always having their text devolve into loops. always having their text devolve into loops. always having their text devolve into loops.

like that, just looping until it hits context limit or I manually stop it, but when it works it's great! I do love the darker, more vulgar and often times rude way that these models depict the characters and world, since most other models always feel like they're always trying to be nice? even to other characters or just to the world itself? it feels monotone in it's politeness.

though that monotone politeness is good for the current project I mentioned in the original post, a bot that is aware it's a bot and is more like an AI assistant, so the more gritty model isn't a big deal right now, just as long as you're recommending models I thought I'd ask which Maid Grand Horror do you recommend?

Sign In

Any tips for how to get the bot to make shorter messages?

Recommended Posts

Recently Browsing 0 members