SYSTEM crashes, not just CTDs. How to document or diagnose?

VenomousOuroboros · December 24, 2025

Specs and Details Up Front:

SkyrimSE v1.5.97

Windows 11

Intel Core i9-14900K (3.20GHz)

192GB RAM

Nvidia RTX 4090

I'm having a recurring problem of Skyrim causing a full system crash (no CTD, not even BSOD, just an instant shutdown and reboot).

This occurs in interior and exterior cells, in both vanilla Skyrim areas and mod-created cells.

Some crashes can occur within 30 minutes of launching the game, others after ~2 hours.

The GPU usage never gets above 60%. I haven't been able to pay attention to the CPU usage or processor temperature.

As you might guess, there's no opportunity for a crashlog to be generated when this happens.

1) Could this be the result of a memory leak or stack/heap overflow? (no, I'm not the sharpest with the jargon)

2) Could this be as simple as loose PSU cables? (unlikely - I can play Elden Ring for 4-5 hours and not encounter any crash)

3) Is there any program/utility that can track and SAVE hardware usage data, so that I can see any major fluctuations that occurred at/before this kind of crash?

4) Do you have any other ideas, whims, or reciprocal questions to help narrow-down this mystery?

traison · December 24, 2025

37 minutes ago, VenomousOuroboros said:

Could this be the result of a memory leak or stack/heap overflow?

No.

37 minutes ago, VenomousOuroboros said:

Could this be as simple as loose PSU cables?

Doubt it. But lack of power in general, sure. Unstable power suppy (i.e. wobbling voltages), or just enough to run but not enough for spikes.

37 minutes ago, VenomousOuroboros said:

I can play Elden Ring for 4-5 hours and not encounter any crash)

Irrelevant.

37 minutes ago, VenomousOuroboros said:

Skyrim causing a full system crash (no CTD, not even BSOD...

How did you determine it's not a BSOD, considering BSODs have been disabled by default since Windows 7 (or Vista?).

37 minutes ago, VenomousOuroboros said:

Do you have any other ideas...

Sure.

Enable BSODs if you want to see them when they happen:

Windows Key + R
sysdm.cpl
Adavnced -tab.
System failure - group.
Uncheck "Automatically restart".
Make sure "Write an event to the system log" is checked.

Check Event Viewer:

Windows Key + R
eventvwr.msc
Windows Logs -> System
Look for events from BugCheck timestamped either when the system crashed or when it rebooted.

Note that if you're running out of power there won't be a BugCheck event. There may however be some Power related event, but I'm not certain. Also keep in mind that, if you do have BugCheck event the thing reported there may not be reliable if you have faulty RAM or some other similar hardware issue. It would make sense to gather or view a few BugCheck events to see if there's things changing in it. For instance, if every BugCheck from a Skyrim crash blamed a different system driver, I'd be suspicious of RAM voltages, timings and the DIMMs themselves.

Edited December 24, 2025 by traison

VenomousOuroboros · December 24, 2025

36 minutes ago, traison said:

Note that if you're running out of power there won't be a BugCheck event. There may however be some Power related event, but I'm not certain.

I'm a tad bit familiar with EventViewer. The "Write an event..." was already checked, so I filtered for Event sources: BugCheck, and there were no events. Each time the system crashes, it generates a Critical event. In each case, there aren't any Error events preceding the Critical.

Considering that you lightly nodded at the PSU hypothesis, how should I take a closer look at that?

(btw, it doesn't escape me that you've stepped up and helped out almost every time I had a question. I'm not kissing ass, just saying thanks in case I hadn't before.)

nopse0 · December 24, 2025

And what does the critical event say, power event, hard disk failure, or so ? If you have crc memory, maybe perhaps an unrecoverable ram error. Also, as you said, over-heating is a hot candidate, maybe your cpu, graphics card or motherboard coolers are defect

traison · December 24, 2025

29 minutes ago, VenomousOuroboros said:

Considering that you lightly nodded at the PSU hypothesis, how should I take a closer look at that?

The only thing I know of is to monitor voltages with something like HWiNFO64 and make sure 3.3 stays at 3.3 and not 3.2 or 3.4. Same with 5V and 12V. Put some minor stress on the system, something that wouldn't cause it to crash normally and test again. But I wouldn't call this reliable. If you have lots of peripherals or internal devices (and yes this incldues usb sticks and xbox controllers) disconnect them, and test a case where you know the system always turns off - like perhaps Skyrim. Got several HDDs/SSDs? Rip out the ones you don't use for your tests. A DVD drive? Gone.

Other than that, the only reliable way of testing is to actually get a new/different PSU.

20 minutes ago, nopse0 said:

And what does the critical event say...

+1

Grey Cloud · December 24, 2025

3 hours ago, VenomousOuroboros said:

192GB RAM

?

traison · December 24, 2025

24 minutes ago, Grey Cloud said:

?

First time seeing a huge E-peen? 😄

Seriously though. maybe VenomousOuroboros runs LLMs. 192 is in the low/mid range then.

nopse0 · December 24, 2025

2 hours ago, nopse0 said:

And what does the critical event say, power event, hard disk failure, or so ? If you have crc memory, maybe perhaps an unrecoverable ram error. Also, as you said, over-heating is a hot candidate, maybe your cpu, graphics card or motherboard coolers are defect

Another thing I wanted to add, if a computer randomly resets, or doesn't power on at all, I first would check the power supply. It's normal that a PS becomes defect after 5 years or so, they aren't made for eternity.

VenomousOuroboros · December 24, 2025

What the Critical Event report says:

"The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly."

Source: Kernel-Power

Event ID: 41

Level: Critical

User: SYSTEM

OpCode: Info

Task Category: (63)

Keywords: (70368744177664),(2)

Peripherals?

KB+M, 2x SSDs, the 4090, and a Sound BlasterX AE-5 Plus (250ohm headset).

No disk drives, no readers, no case lighting.

The PSU:

Thermaltake Toughpower GF3 1650W, purchased and installed just last year.

https://www.newegg.com/thermaltake-toughpower-gf3-series-ps-tpd-1650fnfagu-4-1650-w/p/N82E16817153434?Item=N82E16817153434&Tpk=N82E16817153434

Why so much fuggen RAM?

I was building a brand new PC from scratch. I decided to go big and not go home. Really liking that I made that choice before today's prices.

traison · December 24, 2025

46 minutes ago, VenomousOuroboros said:

"The system has rebooted...

So as the message says, could be it lost power or was just otherwise shut down unexpectedly. I wouldn't read that as "it's definitely a PSU problem", but it's not helping prove otherwise either. There isn't really anything else that can be done remotely except for the "have you replaced component X yet?" questions and I'm not going to bother with those. This may eventually degrade to that, but we're not there yet.

My advice to you would be to learn how to break it, instead of trying to fix it:

Don't focus on testing with Skyrim, try to find another place where it breaks.
Keep your eye out for anything out of the ordinary. I realize Windows itself is so unstable these days that it's difficult to tell what is normal, and what is Bill Gates farting in your face.
Any Windows application crashes (like the desktop or some other system process), or stuttering video play back (try different websites and different local players) could be a hint of memory issues.
Pull all but one DIMM out and test Skyrim again. Do this with every DIMM one at a time.
If you want to test your memory, get the good old memtest86. Don't ignore the advice that says you need to run it for 48 hours (probably more with 192). Also don't assume if it passes every run that the RAM is fine. Edit: and don't get the Windows version if that's still a thing; get the bootable one.
If you can find a OS-independent test-case, like perhaps Furmark and/or Prime95, live boot into Ubuntu and test again. Skyrim is also OS-independent these days but would require more setup.
Monitor temperatures of everything.
Did you use liquid metal under the CPU? Maybe there's a drop that starts moving around when the CPU heats up?

There's probably a thousand other things to try. Imagination is the limit here. Last time I fought with such a problem it took over 100 days, and I learned a lot during that time. Gave me mild PTSD that took half a year to fade out. The solution came to me on day 123 as I was eating soup. It was definitely worth it in the end.

46 minutes ago, VenomousOuroboros said:

Really liking that I made that choice before today's prices.

You could sell half your RAM and still have enough to run 3 instances of TES 6 AND get all your money back for the 4090. Turns out RAM was the best investment strategy of 2025.

Edited December 24, 2025 by traison

Mozny · December 24, 2025

I wanna add in on this because a long time ago I had a very similar issue.

When Half-Life: Alyx came out I played the thing but during the campaign my PC had crashed 2 times exactly the same way you describe it. For me it was pretty simple, the PSU was in fact a problem, to be more exact, my GPU would spike the draw since it was running VR after all, and it would just shut the whole thing down without any warnings or messages.

Try running a benchmark so that your GPU sits at 99% and see if you get it to crash. But looking at your specs, I'm not so hopeful that might be it.

traison · December 24, 2025

8 minutes ago, Mozny said:

Try running a benchmark so that your GPU sits at 99% and see if you get it to crash. But looking at your specs, I'm not so hopeful that might be it.

The thing with power draw on GPUs though that they're quite allergic to FPS. Meaning, you can do funny things to a GPU by rendering a black screen at 10,000 FPS. Obviously it helps to utilize all of the GPU, but my point is FPS plays a bigger part than what might be intuitive. Skyrim with frame limiters off could be worse than Furmark when it comes to power draw. Same reason why Elden Ring seems to be perfectly fine - maybe ER runs locked at 60 or 30 if it's a console game.

VenomousOuroboros · December 24, 2025

Another situation in which I can consistently recreate the system crash is in a set mission+event in Cyberpunk2077. If you can break it down Barney-style and tell me how to track specific metrics in that situation, I'll go ahead and do it. (I'll watch the CPU temperature as I do it, for sure.)

No other apps (as of yet) seem to crash, stutter, or act buggy.

I didn't use any liquid metal, only run-of-the-mill thermal paste.

If it comes down to having to run a 48hr test, I might be screwed for a while. Maybe when work kicks up again after the new year and I don't have enough time to fret about missing out on some gaming (or writing reports from home), I could do it. By then, though, I might have given up on this. Your persistence puts me to shame when it comes to problems like this.

MY PRECIOUS DDR5s ARE MINESES... WE'LL NEVER PART WITH THEM... NO, NO....

traison · December 24, 2025

17 minutes ago, VenomousOuroboros said:

...how to track specific metrics in that situation, I'll go ahead and do it.

The issue is always going to be the same that logging has in any generic application:

The last log message is going to be from the event before where the problem occured, because typically you log the result of something rather than when you're about to do something.
Logging is usually flushed to file intermittently, meaning the last log message is probably at least a few seconds old.

To put that another way, you could be watching voltages and temperatures in HWiNFO64 without blinking, but you will still miss it if voltages drop 25% for a millisecond. This is why I suggested you try to find something that stresses the system, but doesn't quite crash it. 3.3V can go to 3.2 or 3.4 and its usually not a problem. I had a system where 12V was always 12.1V and this was fine. But if that 12.1 became 12.2 or 12 during a stress test, that could indicate something. A deviation of .2 would almost certainly be worth looking into. Temperatures can go up/down by about 30°C per second, perhaps more if there is a cooling issue (not relevant in your case).

Edit: I imagine a proper oscilloscope on the volage rails would catch a power problem as it happens. Got any friends interested in or working with electronics?

Edited December 24, 2025 by traison

Grey Cloud · December 24, 2025

While we are here we could stick our head inside the case to see if the CPU/GPU/case fans are working or if the CPU heatsink is clogged with dog/cat hair or whatever.

My curiosity is piqued.

How many RAM sticks = 192GB? 4 X 48?

What do you run that can utilise 192GB?

plutocene · December 24, 2025

1 hour ago, VenomousOuroboros said:

No other apps (as of yet) seem to crash, stutter, or act buggy.

If only 2 games crash and nothing else, I would argue the issue is GPU driver related, but the information is by no means enough to draw any conclusion.

Have you updated your driver recently? Have you tried a different driver version?

As Traison stated, it can be a myriad of things, like memory failure, psu, mb, a faulty cable, a buggy driver.

Have you run a benchmark stress test i. e. furmark? I would stress test the GPU for 6 hours minimum full load.

Prime95 is great for testing CPU and RAM. I see you're running Intel, 13th and 14th gen are reportedly degrading and dying, so I'm afraid it needs a thorough test as well.

The memory can be tested with memtest, you need to create a bootable usb and let it run overnight, 192 gb will take forever.

PSU is not testable for a normal person afaik, and I doubt you have another 1600 watt one lying around. 4090s are reported to have frequent issues with cables, so check your cable (swap GPU power cable), that 600 watt draw can deform or even burn it in extreme cases.

Edited December 24, 2025 by plutocene

belegost · December 24, 2025

4 hours ago, VenomousOuroboros said:

I decided to go big and not go home.

Should've bought Ryzen Threadripper instead of measly i9 then.

And with current prices you could probably sell half that RAM and get yourself a brand new system with decent specs.

OwlEye · December 25, 2025

curious if the capping global fps with nvidia would stop gpu from spiking

nopse0 · December 25, 2025

Apropos nvidia, could be a physics related Windows/nvidia driver problem. I would try without FSMP, just for testing, to see if you also get the system crashes without physics. Though normally, this kind of problems causes freezes, not reboots.

VenomousOuroboros · December 25, 2025

I want to be very plain and forthright about my apprehensions and some prior conceptions here, because I'm probably on the verge of irritating the shit out of some of you:

1) Opening the computer and manipulating anything scares the hell out of me. This thing costs more than my car, and I'm so far away from any decent professional that taking it to a service center is practically impossible. But I'm making excuses; I'm irrationally afraid of endangering this electronic brick of gold by fumbling around inside it. The bottom line is that I'm pretty feckless when it comes to building PCs; I've pretty much just been "lucky" that nothing ever went horribly wrong (like when I forgot to connect the main CPU cooler to the motherboard and ran a temp of 98C on my first bootup).

2) That said, the PSU cables and the GPU power specifically are the most likely fault points, if it is indeed something physical. I'm not sure how many other PSUs today are designed this way, but all cables are detachable at both ends with my model, cutting down on the number of unused cables sitting at the bottom of my case. That introduces another point at which something could come loose. Might I be on the right trail here?

3) I know about the faults that plagued the CPU when it was first released. It was affecting everyone, having something to do with how the processor managed voltage(?), and it resulted in class action lawsuits. I took advantage of the extended replacement agreement that resulted from all this negative attention, swapped my partially-burned-out processor with a fresh one, applied the microcode update to it to prevent the previous issue from occurring again, and it has appeared to remain stable and reliable ever since... Unless, of course, it's the culprit here.

4) I do want to know what kinds of risks there might be (damage? degradation? overheating?) when I run any kind of a stress test. Obviously temperature will go up. Obviously components will get pushed to their limit. If I have to sit next to my computer and watch it closely while it spends 6hrs sprinting through a marathon, I don't know that I could pay that much attention for that long. If I let it go unsupervised, am I risking something catastrophic?

---

@Grey Cloud

I have these RAM chips on this motherboard (MSI MEG Z790 ACE MAX) for the (probably smooth-brained) reason of "moar RAM = moar good". My overzealous priority was to future-proof my PC build so that I wouldn't have to crack it open and swap anything out for a few years.

Fat lot of good that did me, eh?

traison · December 25, 2025

2 hours ago, VenomousOuroboros said:

I'm so far away from any decent professional that taking it to a service center is practically impossible.

In my experience, no one actually repairs computers anymore these days. I jumped out of the IT sector where I've worked for something like 20 years, after Windows 10 was released and corporations were more inclined to replace the entire computer than blow the dust off the laser when a user's mouse stopped working. My experience with service centers is that they reinstall your OS, charge you $300 and recovering your files is your problem - something you can do at home with a USB stick and 25 minutes of time. You'd need to find someone who actually cares; someone like Louis Rossmann, or me I guess.

2 hours ago, VenomousOuroboros said:

That said, the PSU cables and the GPU power specifically are the most likely fault points, if it is indeed something physical. ... That introduces another point at which something could come loose. Might I be on the right trail here?

I would say no, because a cable has 3 states: connected, almost connected and disconnected. To put that another way: working, burning and not working. Your computer is working sometimes, and it's not on fire; thus cables are connected, no?

2 hours ago, VenomousOuroboros said:

I know about the faults that plagued the CPU when it was first released. ... Unless, of course, it's the culprit here.

Could be. I wouldn't rule it out unfortunately. Intel still hasn't recovered from Ryzen.

2 hours ago, VenomousOuroboros said:

I do want to know what kinds of risks there might be (damage? degradation? overheating?) when I run any kind of a stress test. ... If I let it go unsupervised, am I risking something catastrophic?

As far as I'm aware all related components have overheating protection these days. Your CPU will slow down as it reaches maximum temperature, just like the GPU will. I would assume that if fans are insufficient there's an absolute upper limit where it will just turn off. Fans can also remain running after powering off the system to keep the heat from spiking at the end. Since you're (apparently) dealing with a power issue, I could forsee the worst case scenario for you being extreme heat combined with a shutdown and no power supply to keep fans running. What are the chances of this? Quite low. If you were bypassing the thermal throttling features I'd say quite high.

2 hours ago, VenomousOuroboros said:

Opening the computer and manipulating anything scares the hell out of me.

Doing the musical chairs things with the RAM DIMMs I suggested in one my previous posts shouldn't be rocket science. Installation and removal of these is in your motherboard manual no doubt. Make sure you have the same static charge as the computer before poking around in it. Don't work on a furry rug. Keep one hand on a metal part of the case. Grab a can of compressed air (~$2) with which to blow out debris from the DIMM sockets.

If it's still not working properly, I would just place an order for a new ~850W PSU (don't need 1600W). Get a non-modular, 80 PLUS rated one for a minimum of ~$100 from a popular brand like Corsair.

If you need another excuse to have a 2nd powersupply: you know when power supplies are likely to fail? On a friday evening after a particularily exhausting work week, 5 minutes after shops close. Being able to pull one of out storage and be back in Skyrim in 30 minutes is real nice at that point.

When I bought my current system, I bought 2x 800W PSUs. Turns out now that I have a spare, my in-use one has lasted waaaaay longer than any previous PSU did. It used to be the GPU and PSU would alternate with one dying every 1.5 years (so each part lasting about 3). This PSU Has been going since Windows 7. Murphy's law perhaps?

Edited December 25, 2025 by traison

Grey Cloud · December 25, 2025

4 hours ago, VenomousOuroboros said:

This thing costs more than my car,

And probably any car I've ever owned. 😃

Motherboards usually come with some software which shows info about fans ans temps among other things.

traison · December 25, 2025

43 minutes ago, Grey Cloud said:

Motherboards usually come with some software which shows info about fans ans temps among other things.

Nitpicking perhaps, but installing vendor software is generally a bad idea. Automatic updates, generally badly made and bloated software, ads, broken features... Hardware manufacturers are usually horrible at making software. Take iCue as an example, or Logitech's equivalent.

HWiNFO64: Free, fast, no install, no bs and handles all the hardware you can think of.

Edited December 25, 2025 by traison

VenomousOuroboros · December 26, 2025

A half-update:
I'm going to run the HWiNFO64 test with the RAM chip slot-swap sometime in the next week or so.

If that doesn't yield anything (I'll post the results here for more informed interpretation!), I'll spring for a backup PSU and a spare GPU cable adapter (which I'll probably do once this ordeal is over anyway).

I just hope I can call upon you guys once I've got more actionable info. Thanks for helping as much as you have so far!

asdt123123 · December 26, 2025

95% of the time this sort of thing is caused by power (PSU). Usually with the GPU specifically from either not getting the juice it needs or lose cables. Could also be overheating but I'm sure you've already monitored for that?

What I do is just buy a new PSU/etc, test if it works, return it if I don't need it.

Edited December 26, 2025 by asdt123123

Sign In

SYSTEM crashes, not just CTDs. How to document or diagnose?

Recommended Posts

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members