Long guide on getting started with portrait generation using AI on your local computer

XavierMace · November 19, 2024

This is going to be a EXCEEDINGLY long and (hopefully) in depth post. I'll also preface this with if you don't have a relatively recent CPU and GPU with at least 8Gb VRAM (16Gb+ preferred), your speed is going to suffer greatly, if you can get it to work at all. While a good chunk of it is SFW, later in the guide I do shift to NSFW images, so you've been warned. I am a big believer in explaining WHY you're doing something rather than just telling you what to do so this post reflects that. Keep in mind there isn't one "right" way of doing things when it comes to AI. There's a large amount of personal preference. I welcome input from the others on here already making portrait sets via AI but I'll be writing primarily based on what works for me as it's fairly easy to get running with.

I have zero artistic skills whatsoever so my goal from the beginning was to come up with a process where I don't have to manually touch images at all. I also wanted to be able to generate pictures in bulk. My general target is 50 images per gender for a single portrait set and I generally want multiple sets. My personal preference was also for image styles that lean more towards realistic rather than cartoon which affects what checkpoints I use. With a 4070ti, and the relatively higher resolution settings I walk through below, generating 10 images takes me about 3 minutes. Removing the background and resizing those images only takes a couple more minutes.

I personally don't do ANY inpainting (I'll explain what it is later). Background removal is 100% automated at this point. I've been doing this for about 2 years now and I can say with 100% confidence that my process now is absolutely both faster and produces consistently high quality images than when I started. Doesn't mean it couldn't be improved further, I was going for a balance as I'm not looking to make this into a job.

The first decision you have to make when getting started with local AI image generation is which UI you want to use. Generally you'll either see people talking about ComfyUI or Automatic1111 (A1111). While I started out using A1111, I moved to Forge (https://github.com/lllyasviel/stable-diffusion-webui-forge?tab=readme-ov-file) which is visually and functionally VERY similar to A1111 but most find it to be much more responsive. This guide will therefore be focused on Forge, although 99% of it will work with A1111 as well. The general concepts would be the same with ComfyUI, but I can't really give you specific instructions for it. A while back however, they (Forge) did a major update to support a new base model (Flux) which hosed functionality for older models. Therefore, you'll want to use this link to follow my guide: https://github.com/lllyasviel/stable-diffusion-webui-forge/releases/download/latest/webui_forge_cu121_torch21.7z. You will also want to avoid using their update batch file as that will upgrade you to the newer branch that will break everything you're doing here. A lot of people swear by ComfiUI as it's more "powerful" with the way you can create workflows. While that may well be true, I personally found it far more difficult to work with. But this is 100% preference. There's nothing stopping you from downloading all 3 and pointing them at the same directories for checkpoints and LoRA's so you can try them all out.

Your second decision is what base model and checkpoint you're going to work with. You can download them from https://civitai.com/models. While nothing is preventing you directly from downloading different types of base models, LoRA's are base model specific which makes jumping between different base models a bit more problematic. Think of SD1.5 as the original in many regards. Because it's older than the other options I'm going to mention there's a FAR larger selection of LoRA's available for it. LoRA's are used to "help" AI generate a specific detail or feature. IE a specific style of clothing, or specific body feature. The downside to SD1.5 is a lower base resolution and "natural language" understanding is basically non-existent. This brings us to SDXL and it's branch known as Pony. If you don't have more than 8G of VRAM, you'll probably want to stick with SD1.5. If you have the VRAM, the only reason, IMO, to use SD1.5 is if you're trying to generate very specific images that you just can't seem to get Pony to do but there's an SD1.5 LoRA for. Personally, I no longer have any SD1.5 checkpoints/LoRA's installed.

Pony/PonyXL is based on SDXL so some SDXL LoRA's will even work with it. However it's gotten so popular that at this point a lot of people treat it as a separate base model, so stick with Pony specific LoRA's. The primary reason it's gotten so popular is compared to SD1.5 and vanilla SDXL, it's EXTREMELY good at understanding natural language, especially for adult content. SDXL/Pony is double the base resolution of SD1.5 so the downside is slower generation and a higher VRAM requirement. Flux is the latest new thing. I'm not going to be touching on it for one simple reason. As it's new, it doesn't have the LoRA support needed to do what I personally want to do with it. We'll touch more on that when we get to LoRA's as well as why the resolution is important even though we're ultimately going to be creating very small images.

Basically as long as you've got the GPU power and VRAM available (I'm running on an RTX 4070Ti), I'd highly recommend just using Pony. Therefore you've decided what base model you want to use, now you have to find/pick at least one checkpoint. As long as you're sticking with the same base model (IE SD1.5 or Pony), you can switch between checkpoints at will. Checkpoints, put simply is the base image data that AI will be working from to generate your images. If you pick an anime/cartoon focused Checkpoint, it's going to be harder to generate a realistic looking image. If you pick a SFW focused checkpoint, it will be harder to generate NSFW images. Note I say harder, not impossible. Personally as mentioned above, I'm going for a more realistic art style. I'm currently using Cyber Realistic and Real Dream (both Pony). These are both available from the above mentioned Civitai. You'll only be using one at a time, but you can generate some images with one checkpoint, then select a different checkpoint and generate some more. They're large files (generally 6Gb+ for Pony but beyond that, don't be afraid to try others out.

Strictly speaking you now have everything you need to get started. You just have to provide a prompt and specify your image dimensions. Let's talk about the image dimensions first. Recommended size for Stellaris portraits is 475x380. Rooms are 952x340. For reference events are 450x150 and origins are 220x115 but I haven't personally done anything for those. You could just pop those values into Stable Diffusion, however you'll find that creates problems.

There's a few things AI is known for struggling with. It has a hard time with eyes, especially SD1.5, and has a hard time with creating the appropriate amount of fingers and toes. The smaller your image, the more it struggles to generate a decent looking output. In my experience you get far better results if you generate at a higher resolution and then resize it after words. There's two things to keep in mind when you do that. First, larger image means more VRAM usage and longer generation time. Second, your base model resolution. SD1.5 is 512x512. SDXL/Pony is 1024x1024. Therefore if you want to avoid using upscaling (which further increases generation time) you'll want to stay pretty close to those dimensions.

The dimensions have one other effect. As the base model is square (512x512 or 1024x1024), when you generate non-square images, you sometimes start fighting a problem where your image looks "stretched" in order to fill the image. IE people with unrealistically long torso's or part of a second person is added to the image. Pony is again far better about this than SD1.5 but if you generate using the Stellaris aspect ratios (which I do), expect to occasionally have to throw out some images. With that said, I use I go double resolution for SD1.5 and quadruple for Pony. Meaning I use 712x570 for SD1.5 and 1424x1140 for Pony. If you don't mind figuring out a different process for resizing the images afterwords, generating at 1024x1024 will produce less "bad" images than generating at the Stellaris aspect ratios. Personally I don't think it's worth the extra effort.

The reason I generate at a higher resolution is this is a significant factor in the aforementioned "AI is bad with eyes and fingers". Keep in mind AI doesn't REALLY understand anything. It's just trying to imitate, but that's not backed by physics/reality. Therefore the less pixels you give it to work with, the harder time it has getting the details right because AI pretty much goes "sure that seems pretty similar".

It's also worth mentioning at this point to keep this in mind the "camera angle" you go for in your pictures. Trying to cram a full person, head to toes in the image is going to result in more defects and less overall detail than just doing an upper body shot. Given we're on LL and presumably wanting lewds, I generally try to either stop just below the crotch or use a sitting/kneeling pose to give head and torso as much room as possible but still get the naughty bits in. But you'll notice a significant improvement in detail and consistency going with an upper body shot closer to Stellaris's default portrait.

The other major tool in combating defects is ControlNet. ControlNet is integrated with Forge, which means no extra steps to install it as you would have to with the other UI's. There's two different options with it, both of which are EXTREMELY useful given what we're trying to do, which is generate an image with a person 100% inside the bounds of the image. By default, depending on your prompt, you'll often end up with parts of the body out of frame. Off the bottom isn't a huge deal in game. Off the sides doesn't look so hot in game. So for example lets go with this pose:

image.png.ec6ad12a755ed64f331793145c1d2d23.png

As you can see I've provided it an image to use. Right now I have "OpenPose" selected. This basically tries to map the skeleton of the source image and match that for generation. With the "openpose_full" preprocessor (default selection) this includes fingers/toes and eyes. This is great if you want all your images to have a similar or matching pose (IE Stellaris's default portrait pose). The other option is "Canny". Canny tries to match all aspects of the source image in it's generation. In both cases your Control Weight and the Control Mode will determine how strictly it follows that. For OpenPose, I generally keep it around 0.7 to 1.0 as I want it to match the pose pretty closely. For Canny, I'm generally just going for "kinda similar to this", so I'm usually using weights like 0.15-0.3. Personally I leave Control Mode on Balanced 99% of the time, and Resize mode I leave on Resize and Fill. Keep in mind the resolution of your reference image has a large effect on the results. You want the sample image to be at least as large as your output image.

That takes us to prompting. Prompting is you describing details you want in the image to the AI. This is where you'll be spending most of your time. Up above I used the wording "natural language", I'll elaborate on that here. First let's look at a very basic prompt with the ControlNet options I used above.

image.png.1b8f571afcef808bdebac93f588e3462.png

We're going to discuss the prompt at length in a bit, but let's go over my other settings. There isn't a 100% "always the best" option for Sampling Method. The options available are going to vary depending on your base model. Some LoRA's will recommend a specific option but personally with Pony I haven't found a pressing need to change from the one I'm using above. But feel free to experiment. As you can see I've got my image set to 4x Stellaris' portrait resolution as I discussed above. CFG Scale relates to how strictly it tries to follow your prompt. As I'm trying to do bulk image generation, I'm not looking for super strict as I want some variation. Batch size is how many images it's rendering at the same time. Batch count is how many times it repeats the batch size this run. Your VRAM and image size is going to be your limiting factor on batch SIZE. Batch count is only limited by how long you want it running. With 16Gb of VRAM and the above resolution, a batch SIZE of 2 is pretty much the limit for me. I am however going to use that batch size for all this guide so you can get an idea on how much an image can vary with the same settings. This the result when using the "Canny" option for ControlNet with a weight of 1.0:

image.png.76138b677bdbec3a747624e82a6055b1.png

That did a fairly decent job of replicating the original image and it's the proper aspect ratio for Stellaris. Same settings, except ControNet weight dropped down to 0.15:

image.png.be6306331837bf628e0644c7fee64f1d.png

Keep in mind, we're still using a very basic prompt that doesn't specify much details but we can see it's still captured the general idea of the original image. If we switch ControlNet to "OpenPose" with a weight of 1.0 we get:

image.png.d5056e67702eb76516aa911c82a1b6e4.png

As mentioned above, OpenPose is just trying to control the skeleton of the model. When using the "openpose_full" preprocessor, this includes eye detail (to an extent) and fingers/toes. But all other aspects of the image are basically thrown out. Also keep in mind OpenPose has limits. It's creating a 2D image of your skeleton, so poses with limbs overlapping each other or limbs behind the body will cause it to struggle. However, beyond the pose itself the other value this provides is how much of the body is going to be in the image. If we disable ControlNet and removing "walking" from the prompt, we get this:

image.png.f063a7a293dfe796332ad5a98dede60a.png

Those are still pretty similar, but that's because we haven't really provided many details. Let's add prompts for curly red hair, detailed eyes, and large breasts. Now we get:

image.png.0524c1507046f12e3baf231f36569c9f.png

OK, we can see those prompts took affect (more on the hair color in a bit) but now we've basically just got an upper body shot. Because all the details we described are in the upper body, it basically sees no reason to include anything else. If you include details like "sitting" or "walking", that will generally get it to at least include waist level in the shot. Describing the genitals will also generally do that. But OpenPose allows you to ensure that, largely regardless of your prompt. Before we demo that, let's talk about the hair color. When talking about the differences between base models (SD1.5 vs Pony) I mentioned Pony being far better about understanding natural language. This is a good example. With SD1.5 when you try to specify colors, it generally doesn't understand that you're trying to describe a certain piece of the image. Right now we're specifying a green background, a black body suit and "red" hair and it'd doing that. With SD1.5, if you specify a green background right at the beginning of your prompt like we are here, you'll usually end up with at least partially green clothes and maybe green eyes. For hair colors, it helps to be less generic on the word choice. Same prompt but with "auburn" hair:

image.png.c01492f99282dd9ca2f493888c15f9be.png

Let's take a moment to talk about the background. As I mentioned above as my goal is Stellaris portraits and I don't want to have to manually touch any of the images if at all possible. That means I have to use some form of automation to remove the background. Therefore in order to facilitate that as much as possible, I'm trying to create a background that contrasts with the portion of the image I want to keep and doesn't contain a bunch of objects/details that I'll want removed anyways. If you're trying to create a character with green details, change your background color to something that contrasts as much as possible. Now, back to ControlNet. We're going to re-enable Controlnet using the updated prompt with hair/breast details. Using OpenPose with a weight of 1.0:

image.png.ce767fc5d91e2849beed56f180c408ed.png

Now we've got our pose/camera angle back but we've still got the details we specified. What if we switch to Canny? With a weight of 1.0:

image.png.45a5535ff3c53a0f1b9429e286c843d2.png

Of course the catch her is the details we describe aren't all that different from our reference image. So, let's change our prompt from curly auburn hair to straight blond hair:

image.png.f6bce33c1c88beb94a7682f82675bee9.png

Because our control weight is set to 1.0, that's largely overriding our updated prompt. If we drop the ControlNet weight down to 0.5:

image.png.d99f1c85f57bd4142857823fd18308d0.png

We're also getting a lot of background "noise" that's going to be problematic when time comes to remove the background. If we drop the weight down to 0.25 it's closer but still a bit of curls in the hair as ultimately we're fighting the base image we provided. But the background is pretty much clean now:

image.png.6a692498067038e5a4ec564bfc008d2b.png

If we drop the weight down to 0.1, ControlNet is largely being ignored at that point:

image.png.395ca3367d7e97d60ba02f0020682b55.png

This is where LoRA's come in. LoRA's are used to teach the AI about a specific detail. You can get these from Civitai just like the checkpoints but as mentioned above, you do generally need to use a LoRA who's base model matches what you're using. For example, if you're using a Pony based checkpoint like I am, you'd filter models on Civitai like this:

image.png.1633f7b56be5253e821e1d4ffe6f252b.png

Now, one warning I will give is not all LoRA's are equal. There's a ton of shitty LoRA's out there. If you're using Pony, there's also a lot of unnecessary LoRA's because Pony can do a lot of stuff (reliably) without LoRA's that SD1.5 can't. The most common problem is a LoRA impacting your overall image too much. Say for example you're using a LoRA for a particular style of clothing, IE a Star Trek uniform, but the LoRA ends up affecting other aspects of your image, like the face or skintone. Best you can do is look at feedback/comments on the LoRA and see if there's any complaints. The advantage of LoRA's is (if they were made properly) they allow you to replicate a fairly specific detail without affecting the rest of your image too much. After downloading a LoRA, place it in \webui\models\Lora inside your Stable Diffusion install directory. If Stable Diffusion was already running when you copied the LoRA into that folder, you'll have to refresh the list in the GUI (there's a refresh button on the far right in Forge). Assuming you copied it over correctly, you should see any compatible LoRA's in the GUI here:

image.png.53429a8755b9eab7ee86c45f3f25e237.png

Note I say compatible as, for example if you have a Pony checkpoint loaded, an SD1.5 LoRA is not going to show up in the list. When you click on the LoRA, it's going to add the LoRA to your prompt. Most LoRA's also use a "trigger" word. The download page SHOULD have told you what that is. If it doesn't have one, you can try to use it without. If that doesn't work, hover over the LoRA in the selection list above, there will be a little wrench icon in the corner. Click on that and it will give you some information which includes training dataset tags. Try including some of those in your prompt to "activate" the LoRA. We've shown above how weights affect ControlNet. However, you can specify weights in prompts too both for LoRA's and regular tags/words. LoRA's get added with a weight of 1 by default which effectively means that LoRA is going to take precedence over the rest of your prompt and likely ControlNet too. For example I'm going to use a LoRA that's supposed to give your character a surprised/shocked facial expression. In theory, even at 1.0, it's a facial expression so the rest of the image SHOULD be largely unaffected. One item to note here is source material. I mentioned above that if you pick a checkpoint that focused on a certain art style for example, getting it to generate a different style can be problematic. This is still somewhat of a factor with LoRA's, especially with higher weights. Generally speaking though you can make it work. Updated prompt with ControlNet still enabled, Canny, 0.5 weight on ControlNet:

image.png.e72072b043b8f2743a5cda761e319508.png

This results in:

image.png.37b9da269d58d92138889fb50bcd91ab.png

Mission successful. Facial expression has been altered but the rest of the image is still consistent with what we wanted. This particular LoRA has a mostly anime dataset, so let's try lowering the weight a bit. With the LoRA weight set to 0.3, we get:

image.png.964de3f3d6678acd2d600b0ecc51549f.png

Effect is a bit less pronounced but it's still notably different than without the LoRA. Generally you'll want to try to keep your LoRA weight as low as possible to minimize the overall effect it's having on the image. This is especially true if you're using multiple LoRA's. Let's add a LoRA to try to make Black Widow look like an Andorian from Star Trek and add "andorian" and "white hair" to the prompts. I've left the Shocked/Surprised LoRA at 0.3 and Andorian LoRA at the default 1.0 and white hair specified:

image.png.704a203af9c726da45eca4984fef18c1.png

Obviously two things jump out here. At 1.0, the blue "theme" of Andorian's has basically wiped out my green background and the shocked expression is mostly gone. However, we still have Black Widow's outfit and general hair style. While we have both "straight blond hair" and "white hair" specified in the prompt, the weight on the Andorian LoRA trumps the "straight blond hair" despite the LoRA being specified at the end of the prompt. If we drop the Andorian LoRA down to 0.3 as well, we get:

image.png.a6d32e6d8ea4ed4ed2a4f82f56a26af5.png

Save for the white hair, the Andorian feel is pretty much gone. As we can see, even at 0.3, it's adding other clutter into the background so in general I'd classify this as a subpar LoRA for our purposes. However, what if we try to fine tune the prompt a bit more? As mentioned above, the order of your keywords affects it's priority. However, you can also apply weighting to a keyword using parenthesis and a colon with a weight value. For example if we update the prompt to this:

image.png.a343e6022810590d9bbf8e7f7d8dc0e9.png

That gives us this:

image.png.87296ca48e6afc26efb3306295785574.png

We lost the antenna still but overall not bad. Keeping the weight on the LoRA itself let us (mostly) keep our green background but weighting andorian and blue skin higher allowed that to come though. Generally speaking, you're not going to be going above 1.0 on a weight for anything as that's generally going to have too much effect on your image overall. The one exception is "slider" LoRA's. If you're using SD1.5 I HIGHLY recommend looking into these given SD1.5 has far more difficulty handling details like skintone/ethnicity. Pony is far better at it, but sliders can still sometimes give you better control over that particular detail. IE Age, Weight, Breast/Penis size, etc. As an example, let's add a slider that's designed to control breast sag. The download page for the slider should specify the recommended range for the slider and which "direction" the slider goes. IE does a positive value make the breasts larger or smaller. Unfortunately it's sometimes counter intuitive. Sag slider set to 2 gives us:

image.png.c32a47253720c83c3174dcdaa053ead2.png

Being clothed is providing a level of "control" to the effect but you can see a difference there. If we set the slider to -2, we get:

image.png.d275daa62c92e85b90c8b280ddf50d66.png

Can definitely set the difference in the breasts but as we can see one of two images is back to non-blue skin. This is the risk in combining LoRA's. If the LoRA isn't finetuned enough on what it's controlling, they will conflict with each other. Sometimes you'll just get inconsistent details like above, sometimes (more common when running overly high weights) you just get a flat out failed generation on one or more attempt like this:

image.png.80c92fccf2e32f18d64d193a7e910ae9.png

Everything we've done so far has been about controlling details because, presumably you're going to have some sort of theme/style in mind for your portrait set. However, all the above images also look exceedingly similar and as I mentioned personally I want to be able to do it in bulk. Way back in the beginning I mentioned Inpainting. You can find this under the img2img tab. Basically you feed it an image and they highlight which parts of the image you want it to re-draw. While that's great for fixing small flaws with a single image, using that to try to create a bunch of variants of a master image is less great. So, to help us with this (and a later step), go to the Extensions tab in the UI and we're going to install a couple of extensions.

image.png.db06c6705ed6565358a288df94140285.png

One critical note here. If you're using Forge rather than vanilla A1111 (which you are if you're following my directions), the available extension list is pulling from A1111's extensions and not all of them play nice with Forge. However, we're really only concerned with a couple of them and they work fine. sd-dynamic-prompts we'll be using right now. Stable-diffusion-webui-rembg is what we'll be using later to remove the backgrounds. sd-webui-regional-prompter, I'll touch on a little but I'm not generally using it and would say you can skip it unless you're going to be doing rooms. It does come in handy for rooms as this allows you to control where in the image stuff appears. IE the person only on the left half of the image. Once you click Apply and Restart UI (which does wipe whatever prompt and settings you've applied so far), you'll have some new options in the UI down where ControlNet is:

image.png.ecd0860387566302ba14b9ba4220a313.png

For Dynamic Prompts, which is what we're going to play around with now, there's nothing you need to do down here. If you decided to play around with Regional Prompting, this is where you tell it how to break up your image into regions. Dynamic Prompting is the secret sauce that allows you to generate images of a specific style in bulk but still get different details. We're going to keep ControlNet enabled, using Canny, and our original source image, but set the weight to 0.3 as we only want to lightly adhere to that. This is also going to further demonstrate what happens when you're using Canny but prompting for things that conflicts with the source image. I'm also bumping the batch count up to 5. That combined with the batch size of 2 means we're going to get a total of 10 images in one run. Here's our new prompt:

Dynamic Prompts allow us to use curly braces to create tag groups with the individual values specified by pipes. That's of minimal value if you're only generating one or two images at a time, but if you're doing bulk, this resulted in:

image.png.5aea9fb7a460ead855e98253230ecf5d.png

Because we have a low ControlNet weight, combined with prompts defining things like nipples, you're generally going to get topless or "open" tops. At that point, unless you've prompted for clothes on the lower half, odds are pretty good you're going to get nude or pretty close to it. In other words, if you want clothes, make sure you prompt for them somehow. You can use dynamic prompts to control colors on the clothes too. But, you can also see some it's still trying to put some of the clothing details from the ControlNet image on the bare body.

But we can see here that in a single run we got clear differences in hair style/color and facial expression. You could up your batch count to 30 to get a solid 60 different images which gives you a decent single gender portrait set. Skintone/ethnicity variation is generally a bit harder to reliably get, if you're using LoRA's, as a lot of LoRA's (especially anime ones) are generally pretty heavily skewed towards white or asian datasets. We have two breast related LoRA's specified. While we're prompting for various breast sizes, those two LoRA's somewhat limit that even with the low weight. Now let's remove the nipples related prompts and both LoRA's. We're also going to add "black bodysuit" to the prompt. Now we get:

image.png.2cdbbab6c4237019472d27dbf8a5161d.png

There we go, now we've got some uniqueness to each image including skintone/ethnicity but still largely adhering to the overall "theme" we've been working with. As mentioned above on a 4070ti, those 10 images took about 3 minutes total. You could try upping the ControlNet weight a little bit if you wanted to try to keep the uniform a bit closer to authentic but you will fairly quickly run into the point where you're struggling to get variation in the other details. It's also worth noting you CAN use dynamic prompting on weights too. IE: if you set a slider LoRA's weight to {3|-3} that will give you images from both end of the sliders spectrum. One other word of caution. You may have noticed the age slider in the above screenshots. While it's very handy to control age appearance, it's VERY easy to get images in the "Hello FBI" territory. Especially if you're using anime LoRA's combined with realistic prompts and checkpoints. Even if that wasn't your intent. AI's never going to ask you "Hey, are you sure about that". You have been warned. But using the age slider with appropriate values combined with tags like "mature" help keep it out of the danger zone. Now let's talk about Regional Prompting briefly, primarily in case you want to do rooms. If we update our above prompt to also include a window with a view of a city, we get something like this:

image.png.16bcf60558384930f2ef49ae522d324f.png

It's exactly what we asked for, but not really what we need for a Stellaris Room picture. So, I'm going to change my image size to 1904x680 (exactly 4x the size Stellaris is expecting) and enable Regional Prompting. Note that ControlNet and Regional Prompting don't play terribly well together. So I'm disabling ControlNet and using these settings to basically split the image into two regions:

image.png.c5d48dc3e0ac98fa77c853b856da860b.png

I'm using the common prompt option which allows you to specify info that's common to all regions. So we have to update our prompt to describe the common part and then we use "BREAK" to define the regions. So our prompt now looks like this:

Which results in images like this:

image.png.c5263a786e65dfc54853de7f50110ddd.png

Since both the window and the woman are in the common prompt, there's some amount of overlap. But we can see the woman is mostly on the left half and the window is mostly on the right. That's all the more I'm going to talk about Regional Prompting as it doesn't provide a ton of value (IMO) to portrait sets but if you want more info: https://stable-diffusion-art.com/regional-prompter/. It's a little bit old and using vanilla A1111 instead of Forge, but it should give you the general idea.

So now we've generated some images. Next step in making them ready for Stellaris is removing the background. Earlier when we looked at extensions, I mentioned we'd also be using stable-diffusion-webui-rembg. If you didn't already, install it now from the extensions tab. Then click on your extras tab. There should be a Remove Background section at the bottom and we're going to be using the Batch From Directory option. Input folder is where your original images are, I personally just leave them at the default generation location. Output is where you want the new images created. I personally use a dump folder and sort/separate from there if I generated multiple sets that day. I generally use isnet-general-use on the Remove Background settings and that generally works reasonably well IF, and I stress IF, your images have good clean and contrasting backgrounds like I mentioned above. So your settings will look something like this:

image.png.70805446361cb7c48778a1863dba6d4c.png

Note that everything we've done up until now as GPU based work. This is CPU based. On an i7-12700K, it runs 15-50% CPU usage so it doesn't necessarily stop you from doing other things. Removing the background on 400 images takes about 15 minutes on my computer. The actual image generation is likely using 100% of your GPU and it's VRAM, so don't expect to do anything else while you're doing it.

The one catch with this is the background removal doesn't aways work as well as you'd like, especially if your images have "noisy" backgrounds. This is absolutely the most problematic part of my workflow. If you've used my portrait mods, you've probably seen some examples of this, especially on my older portrait sets. In those cases you'll often end up with either some parts of the background not being removed, or parts of the person being removed (IE legs and/or arms) if it blended in too much with the background.

In those same cases if you want to remove the backgrounds manually via your image editor of choice, you're going to find the magic wand tool having the same problem which forces you to spend a fair amount of time trying to clean it up. Personally I just toss those pictures in the trash and try again. I can generate 20+ images in the time it takes me to manually remove one background from a "noisy" picture. I guarantee you I'll get at least a couple new images in that time which don't have an issue. But in any case, you'll want to take at least a quick look through the image post background removal to make sure they're good. Here's an example:

That's one of the Andorian images we did before that still had some background clutter in it. Beyond the bits of background that's still there, we can see part of her left arm has been removed and parts of her body are partially transparent in general. There's pretty much always going to be a bit of "noise" around the hair, but once we shrink this down to Stellaris portrait sizes, you really won't notice. If your whole set has problems, try using one of the other Remove Background options. Sometimes I have better luck with the u2net options. If you're better with image editing than I am, or just less lazy, you might be better off manually removing. But, for example, right now the published versions of my mods add up to just shy of 8,000 images and I've got another 14k images generated waiting to be used. I'm not going to spend the time to manually touch up all of those.

Now we need to get those images ready for Stellaris. It's recommended to rename them, but not necessarily required. I can't speak for ModderWhoModsThings's generator (second link below), but Goregath's generator (first link below) can use folders to distinguish the necessary attributes so naming isn't strictly necessary. I'd still recommend it though if you're going to be generating more than one portrait set just so you can tell what's what. You can open a Powershell prompt in your image directory and use this to bulk rename:

Quote

$nr=0
ls | %{Rename-Item $_ -NewName ("01Aquatics_F{0}.png" -f $nr++)}

That would rename everything in the folder to "01Acquatics_F" sequentially starting from 0. If the Powershell prompt isn't current in your image folder and you run this, bad things are probably going to happen. Therefore make sure you're in the right place. Note that while textures for animated portraits have to be .dds images, there seems to be no need or reason to convert these images to dds files. They work just fine as png's and saves you the extra step of converting.

That leaves resizing them. As you may recall they're currently 4x the size they need to be. For this I use https://imagemagick.org/script/download.php as you can run it from the command line to bulk resize images. Once installed, from the same prompt you can run:

Quote

mogrify -resize 475x380 *.png

To resize all the png's in the directory to the 475x380 that Stellaris expects for portraits. This gives you appropriately sized images but with much better detail than you'd get trying to just generate at that size.

Your final step is turning this pile of pictures into an actual mod. I'm not going to cover that here, but there's two people with programs/scripts to help you with that:

Not a slight to ModderWhoModsThings, but I have a strong personal dislike for Java so my preference is the first one. But if you want a GUI, you'll want the second option.

Regey · November 28, 2024

No one has said anything, so I have logged in just to say a huge thanks.

This guide is insane, and an amazing introduction to both AI gen and modding.
I'm sure that there are a lot of silent people who will make use of this. Thanks for your time!

Sign In

Long guide on getting started with portrait generation using AI on your local computer

Recommended Posts

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members