Jump to content
IGNORED

Tweaking the HDT-SMP config to properly utilize OpenCL for big gains in performance and accuracy


goaway

Recommended Posts

Did you ever wonder why SMP physics in Skyrim is so performance intensive?  Surprisingly, it could be because it is not set up to take full advantage of your hardware.  OpenCL is extremely good at performing the kind of calculations used by a physics engine which runs on dozens or hundreds of paralell entities, but the way HDT-SMP is set up by default is not tuned to make use of it.

 

<configs>
	<opencl>
		<!--warning : not finish yet-->
		<!--warning : this can be slower because of the bad PCI-E transfer and bad schedule-->
		<enable>false</enable>
		<platformID>0</platformID>
		<numQueue>16</numQueue>
	</opencl>
	<solver>
		<numIterations>20</numIterations>
		<groupIterations>1</groupIterations>
		<groupEnableMLCP>true</groupEnableMLCP>
		<erp>0.2</erp>
		<min-fps>60</min-fps>
	</solver>
  </configs>

This is the default config for HDT-SMP that would be found in your skse\hdtSkinnedMeshConfigs folder.  As you can see, the openCL section is commented with warnings.  After messing around with it in my spare time I have discovered a couple of things.

 

1. These numbers are no where near what OpenCL should be run at.

2. This is probably 99% of the reason why OpenCL seems to have bad performance.

 

So how can you tweak this to fully make use of OpenCL?  For starters, you are going to need a tool that lets you see the OpenCL profile for your CPU/GPU, called "GPU caps viewer".  You can find the link to it on this page

 

When you download and install it, open it up and you will see a screen that looks like this

 

Part1.png.f83926d88776a6c162807548f27ff491.png

 

Click over to the tab marked "OpenCL" and you will see a screen that looks like this

 

Part2.png.a73350999b01433958c27ffa1f2211ca.png

 

Depending on your CPU, GPU, motherboard, and bios, you may have multiple entries here.  For this example, you can see that I have my GPU listed, a Nvidia RTX 2080 TI

 

Part3.png.187c0567b6aff2eab9964f3e1dd98e38.png

 

I also have the "intel HD graphics" platform profile, which is essentially the on-board display portion of my CPU chipset.  A lot of skylake/coffeelake or newer CPUs will have this.  You may want to come back and try this platform later, but for right now, let's stick with the GPU platform.

 

 

There are a few things we want to know from the profiler.  The first is the platformID.  Take a look at the default config again, and you should notice this line.

 

		<enable>false</enable>
		<platformID>0</platformID>

The platformID, 0, actually denotes NOT using an OpenCL capable platform on my device, and the OpenCL capable platforms are 1: GPU, and 2: CPU graphics.  If you enable OpenCL but don't select the correct platformID, you will almost certainly see a drop in performance.  This may explain the commented section.

 

Since my GPU is platform 1, I will change my config to look like this

 

		<enable>true</enable>
		<platformID>1</platformID>

 

Right away this gave me a pretty sizable performance boost.  It actually gets better though.

 

Next you want to determine the maximum number of queues you can scale up to.  One of the biggest bottlenecks for HDT-SMP is actually not how quickly it can perform the calculations, but the fact that physics engines are designed to calculate in real time!  If the physics calculations are not completed for whatever reason before the GPU completes a given frame, the frametime render may actually be throttled until the physics solver can keep up. 

 

This means that even if your system can easily handle the calculations, if the solver is not able to process enough calculations simultaneously, it will create a bottleneck!  And due to how world physics seems to work, this bottleneck may actually throttle your FPS down to keep the time-based physics calculations accurate. 

 

Note: This is my own speculation, however, I am not 100% certain that the implementation of SMP works the same way, but it certainly seems to be the case.  

 

Back in GPU caps viewer on the OpenCL screen, you want to find the value for maximum parallel work-item/work-group sizes

 

Part4.png.788af7adff01f563ee04d96b72a88dea.png

 

As you can see from my GPU, it is capable of 1024 x 1024 x 64 array calculations with a max workgroup size of 1024.  I'm not exactly a programmer, nor particularly familiar with multi-threaded architecture, but I think this may be where some of the confusion originally came from. 

 

Intel describes the maximum workgroup per slice of 16, and the calculation to determine maximum necessary work groups as ( work items ) / (work items / work group).  However, this is per slice of CPU graphics, NOT the total, which Intel recommends as 256.  When I went back to look at my CPU HD graphics platform, this is exactly the max work group size listed, 256.

 

So going on that value, and using my GPU max work group size value of 1024, I changed my config to look like this now 

 

	<opencl>
		<!--warning : not finish yet-->
		<!--warning : this can be slower because of the bad PCI-E transfer and bad schedule-->
		<enable>true</enable>
		<platformID>1</platformID>
		<numQueue>1024</numQueue>
	</opencl>

 

The first time I tried this I was sure this would cause a crash.  After all, the default value for queues is 16, and this is 64 times larger. 

 

Not only did it not crash, but it ran absolutely flawlessly!  I even stress-tested it a bit by loading up a huge number of NPCs in a single cell, all of which were wearing HDT-SMP outfits that required calculations.  Previously this would drop my framerate considerably, but not so anymore!

 

You may want to experiment with this a bit, but there are two things to keep in mind.  You shouldn't set your numQueue higher than your maximum workgroup size, and whatever value you do set needs to be a power of 2.  

 

Let's go back to the intel CPU platform for a second.

 

If you run a very demanding ENB, you might already have your GPU close to maxxed out with just post-processing enhancements.  In which case, you might find more benefit out of using your CPU to perform the calculations, since skyrim LE almost never taxes modern CPUs anywhere close to what they are capable of.

 

If I wanted to use my CPU instead, I would go to the OpenCL tab back in GPU Caps viewer

 

Part5.png.96e6957ad159f21417a353a37528e621.png

 

Based on these values, I would then set my config like this

 

	<opencl>
		<!--warning : not finish yet-->
		<!--warning : this can be slower because of the bad PCI-E transfer and bad schedule-->
		<enable>true</enable>
		<platformID>2</platformID>
		<numQueue>256</numQueue>
	</opencl>

 

You may need to experiment a little to see which option gives you better performance.  In my case, my GPU is powerful enough to be able to handle both ENB and physics, but your results may differ.

 

With that set up, you can either be satisfied with the performance improvements, or you can try and tweak the solver calculations too.

 

The default values for these vary from what I have seen, but generally it looks like this

 

	<solver>
		<numIterations>20</numIterations>
		<groupIterations>1</groupIterations>
		<groupEnableMLCP>true</groupEnableMLCP>
		<erp>0.2</erp>
		<min-fps>60</min-fps>
	</solver>

 

There are a few things we can try changing here.  The first one is the ERP value.  What is ERP?  It is the error reduction parameter, and it represents the % of error that is corrected with each frame of time that is calculated by the physics solver.  When Physics engines run a calculation on a joint of two connected bodies(bones), there is a natural divergence from the imposed constraints that happens to occur.  This is a good reference

 

It ranges from 0.1 to 1.0 (in theory), but most references I have found say 1.0 is impossible, and 0.9 is the limit.  It is how quickly (closer to 1.0) or how smoothly (closer to 0.1) the error is corrected.  This is relevant for outfits and hair that has a long chain made out of multiple joints.  I have found that I prefer setting this a bit higher, so errors are corrected quicker, and I set mine to 0.4.  Setting it to 0.5 or higher introduced some instability that made my game start crashing, although you may get different results.

 

Next you can try increasing the iterations.  Most of the physics engine references I have found suggest that 20 is already on the high end of iterations, but what about the group iterations?  What I have found suggests this is performed on a connected group (such as every joint in a strand of cloth, a tail, or hair, for example) that is performed in addition to the total quick-step world iterations.

 

So, since using OpenCL allowed for a huge expansion of the number of calculations that can be performed, I changed it to match the total iterations (which may be overkill) and now my config looks like this

	<solver>
		<numIterations>20</numIterations>
		<groupIterations>20</groupIterations>
		<groupEnableMLCP>true</groupEnableMLCP>
		<erp>0.4</erp>
		<min-fps>60</min-fps>
	</solver>

 

This worked pretty well!  I decided to push the total iterations up a little until I started getting stability/performance issues, and ended up with the final values of num 32 and group 20

 

and adding the previous section, my overall configs.xml is now

 

	
<configs>
	<opencl>
		<!--warning : not finish yet-->
		<!--warning : this can be slower because of the bad PCI-E transfer and bad schedule-->
		<enable>true</enable>
		<platformID>1</platformID>
		<numQueue>1024</numQueue>
	</opencl>
<solver>
		<numIterations>32</numIterations>
		<groupIterations>20</groupIterations>
		<groupEnableMLCP>true</groupEnableMLCP>
		<erp>0.4</erp>
		<min-fps>60</min-fps>
	</solver>
  </configs>

 

This has not only given me a SIGNIFICANT performance boost, especially when surrounded by many actors all wearing HDT-SMP outfits, but it has also improved the accuracy (ie realism) of the physics quite a bit.

 

Give it a try and let me know what you think.  It may take some experimenting to find the ideal values if your system is already strained by other parts of your Skyrim build, but hopefully this will make SMP physics more appealing and minimize whatever performance impacts it may have on your system.

 

If you experience problems or have stability issues and need to undo any changes, the default settings are shown in the first box.

Link to comment

hey.

 

this looks very  interesting, but it would be nice  if you made a comparison video before/after type for this  so we can see the actual results  and differences ,  at least a  snapshot of same scenes  with framerate visible would already help, but video is the way to go, I highly doubt many people will try this based on few words, otherwise it seems something very useful and really a comparison would  be very appreciated. 

 

Also have  you tried this  in Special Edition?

Link to comment

goaway ,This actually works pretty well with same your results on GTX1060, be it clothes and body or single body for testings, even with HDT+SMP combo. I always knew there was something in config file that stress hard FPS on calculations from just rigged bones and xml data.Good job on it !  

Link to comment

I am sorry to floor your excitement, but the opencl tag is not implemented in both LE and SSE binaries. I devoted a lot of time to figure it out in the past (before sources were released) and ended up disassembling the DLL only to find out the tag is not implemented.

Now when source codes were released you can check yourself https://github.com/aers/hdtSMP64/blob/master/hdtSMP64/config.cpp

You can check smp log and you will see something like unknown parameter in the first lines and if you comment out the opencl tag then there will be no more unknown parameter warning the log.

 

If SMP code does not select OpenCL platform/device then it is on bullet engine to choose one and I bet it always defaults to CPU if not specified otherwise. Maybe someone else can find an answer or even better modify the sources to actually support OpenCL platform/device selection.

 

The solver section IS implemented and settings can have impact on the performance. Especially group iterations which are used for group constraints and LERPs. The values you set are extremely high and if used with complex mesh physics utilizing both group constraints and LERPs will have serious performance impact while having very little accuracy difference.

 

Still , the biggest performance killer are collision calculations for which GPU might help.

Do you have a proof to support your theory that SMP in your system is using GPU?

Link to comment
On 3/27/2020 at 4:57 AM, agiz19 said:

hey.

 

this looks very  interesting, but it would be nice  if you made a comparison video before/after type for this  so we can see the actual results  and differences ,  at least a  snapshot of same scenes  with framerate visible would already help, but video is the way to go, I highly doubt many people will try this based on few words, otherwise it seems something very useful and really a comparison would  be very appreciated. 

This is very hard to quantify or explain unfortunately, my FPS is capped at 60 and doesn't really change except in cells with many NPCs.  The best way I know how to see the performance impact of physics is to make a cell that brings me down to about 40-45 fps and then open the console.  When you do this, the graphics are still rendering full-power, but the physics engine stops calculating while the console is open, so my fps shoots back up to 60.

 

When I set it up to use OpenCL, I haven't been able to drop my FPS by physics, the only way it ever goes down is when I overload my GPU with supersampling resolutions and that doesn't have anything to do with physics.

 

That said, I already went through a lot of trouble writing the guide.  It is fairly simple and does not take very long to test this yourself, so I don't really have any intention of making a youtube video to "prove it".  It's ok with me if you aren't interested, I just wanted to share it in case somebody else is.

 

13 hours ago, OrrieL said:

I am sorry to floor your excitement, but the opencl tag is not implemented in both LE and SSE binaries. I devoted a lot of time to figure it out in the past (before sources were released) and ended up disassembling the DLL only to find out the tag is not implemented.

Now when source codes were released you can check yourself https://github.com/aers/hdtSMP64/blob/master/hdtSMP64/config.cpp

You can check smp log and you will see something like unknown parameter in the first lines and if you comment out the opencl tag then there will be no more unknown parameter warning the log.

 

Still , the biggest performance killer are collision calculations for which GPU might help.

Do you have a proof to support your theory that SMP in your system is using GPU?

The fact that it is using the GPU is very easy to see.  Using MSI-Afterburner and RTSS, I can track my GPU utilization and VRAM.  My GPU maxxes at about 80% in skyrim using my ENB settings, and I can switch to the GPU PlatformID of OpenCL and see that rise to 95% without an increase in VRAM.  Changing the platformID is the only variable that is different.  I can't really think of an alternative explanation for that.

 

I should add that I do not have anything in my logs like unknown parameter.  Maybe it's different based on which version of the .dll you are running?

 

Quote

If SMP code does not select OpenCL platform/device then it is on bullet engine to choose one and I bet it always defaults to CPU if not specified otherwise. Maybe someone else can find an answer or even better modify the sources to actually support OpenCL platform/device selection.

I was pretty sure that it used ODE quick-step physics as a solver, but it sounds like you know a lot about this, so I will take your word for it. 

 

But... the whole point of setting the platformID is to specify the GPU version of openCL (which is openCL 1.2 C, as opposed to the CPU-graphics which uses OpenCL 2.1)  The variables are part of the actual OpenCL code language, so maybe it doesn't really matter if SMP defines them in the source as long as it integrates the OpenCL libraries.  Your guess is as good as mine.

 

I'd also like to add that I verified the relationship between numQueues and max work group size by going over the limit.  When I set it to 1025, my game would bog down and gradually become unstable, and setting it any higher than 1025 caused an immediate crash.  I am uncertain how I could possibly explain this observation if it doesn't actually use OpenCL.

Quote

The solver section IS implemented and settings can have impact on the performance. Especially group iterations which are used for group constraints and LERPs. The values you set are extremely high and if used with complex mesh physics utilizing both group constraints and LERPs will have serious performance impact while having very little accuracy difference.

That's sort of the point.  If I used it with OpenCL= false or PlatformID=0, it would cause serious performance issues.  But when I set it up like this, it works just fine.

 

You are right, I probably overdid it with the iterations, but my methodology was to keep incrementing the values until I got stability or performance issues, and then scale back a few steps.  

 

The collisions are the aspect where the accuracy of the simulation is most noticeably improved.  Any deformation caused by one mesh interacting with another is significantly less, well... deformed looking.  Hand/breast interactions (this is loverslab, afterall), for example, do not have the same unnatural looking stretching, the deforming mesh looks more stable and less jerky.  My assumption was that this is a result of the combination of increased total iterations plus the group iterations since I use a bodymesh and SMP config that utilizes multiple breast bones.

Link to comment
9 hours ago, goaway said:

I should add that I do not have anything in my logs like unknown parameter.  Maybe it's different based on which version of the .dll you are running?

Interesting, are you on LE or SE? Can you please send me the DLL that you are using via msg? Thanks.

 

EDIT: It made me curious and I asked competent sources about it. OpenCL in the SE version was never implemented, everything runs on the CPU. LE sources were not released so we can only speculate, but I guess SE SMP code is just evolution of LE SMP code and Hydrogen probably used most of the LE code as a base so if it is not in SE version then it is probably not even in LE version. But I am still curious about the DLL you are using.

Link to comment
18 hours ago, OrrieL said:

Interesting, are you on LE or SE? Can you please send me the DLL that you are using via msg? Thanks.

 

EDIT: It made me curious and I asked competent sources about it. OpenCL in the SE version was never implemented, everything runs on the CPU. LE sources were not released so we can only speculate, but I guess SE SMP code is just evolution of LE SMP code and Hydrogen probably used most of the LE code as a base so if it is not in SE version then it is probably not even in LE version. But I am still curious about the DLL you are using.

There are older versions that use OCL.

 

You also want the target per frame Interpolation to be 64, not 60.

Link to comment
On 3/31/2020 at 1:30 AM, 27X said:

There are older versions that use OCL.

 

You also want the target per frame Interpolation to be 64, not 60.

 

I thought you might be right, I use (and dig in it) SMP for about 2 years and I never tried the old DLLs so I gave it a shot and tried.

 

First we are talking about Skyrim LE, not SE.

I searched my drive and found an archive containing 5 different DLL version from 2015-2018.

I disassembled every one of them and was looking for the strings and to see if there is a condition to parse <opencl> tag and if there is "Unknown config" warning if unknown parameter is found.

 

 

I modified the config with the same settings as OP is using.

Then I loaded a game and used F-F animation using the high poly CBBE body with custom XML to trigger triangle-triangle collisions.

 

I do not think that testing with NPCs just walking around is good enough as physics calculations (ie. movement, not collisions) is not that CPU intensive and even potato CPUs can handle it. Collisions are where things get ugly.

 

Here are the results:

 

This is my OpenCL system (info taken from hashcat)

OpenCL Info:

Platform ID #1
  Vendor  : NVIDIA Corporation
  Name    : NVIDIA CUDA
  Version : OpenCL 1.2 CUDA 10.2.131

  Device ID #1
    Type           : GPU
    Vendor ID      : 32
    Vendor         : NVIDIA Corporation
    Name           : GeForce GTX 1660 Ti
    Version        : OpenCL 1.2 CUDA
    Processor(s)   : 24
    Clock          : 1875
    Memory         : 1536/6144 MB allocatable
    OpenCL Version : OpenCL C 1.2
    Driver Version : 442.19

 

Name: hdtSkinnedMeshPhysics.dll

Date: 04/07/15
Size: 1243136 bytes (1214 KiB)
CRC32: 815976EF

Disassembled for strings check, no opencl condition for config parameter found, also no error string for unknown parameter found
result: no unknown warning in the log (as expected as its not even in the code to warn for unknown parameters), collisions destroys CPU, game runs at 1FPS during collisions, no visible change in GPU utilization

image.png.6556445d51d80d18c78f2a7d5b7103a0.png

 

Name: hdtSkinnedMeshPhysics.dll

Date: 08/14/16
Size: 823296 bytes (804 KiB)
CRC32: 229FB1A4

Disassembled for strings check, no opencl condition for config parameter found, also no error string for unknown parameter found
result: no unknown warning in the log (as expected as its not even in the code to warn for unknown parameters), way better collision performance, fps ~9, some small change in GPU util, mostly due to the CPU being able to pass more work to process.

image.png.47b2b057c6107bcadd29854450422910.png

 

Name: hdtSkinnedMeshPhysics.dll

Date: 09/14/17
Size: 829440 bytes (810 KiB)

CRC32: 6B225921

Disassembled for strings check, no opencl condition for config parameter found, unknown parameter condition was found in this version
result: WARNING: Unknown config :  in the log as expected, around the same performance as previous version.

image.png.bc4449b5778b1dca4a0cf8ccb6eebb9e.png

 

Name: hdtSkinnedMeshPhysics.dll

Date: 10/11/17
Size: 834560 bytes (815 KiB)
CRC32: 182CFB1A

Disassembled for strings check, no opencl condition for config parameter found, unknown parameter condition was found in this version
result: WARNING: Unknown config :  in the log as expected, around the same performance as previous version.
 

image.png.f656280483ca07f015a9c1f190d4bd90.png

 

Name: hdtSkinnedMeshPhysics.dll

Date: 08/25/18
Size: 910336 bytes (889 KiB)
CRC32: 8ECBD2A2

Disassembled for strings check, no opencl condition for config parameter found, unknown parameter condition was found in this version
result: WARNING: Unknown config :  in the log as expected, way waaaay better performance then all previous version, some change in GPU utilization, but that is mostly due to the fact that the CPU is actually free to give it some work

 

image.png.17958d0b27f9c683d3a63f7c38e010c0.png

 

2nd number in the screenshots in the GPU line is the GPU bus usage.

 

So as you can see the early versions perform so poorly for collision calculations, that they are basically unusable. Remember that what I created is basically unoptimized XML, that you would probably never wanted to have. The old DLL version might still work ok, but the last one just gave clearly the best performance.

 

So one would say, wow this must be using GPU and the opencl parameters must work. No, they dont, the binary ignores those parameters, flags them as unknown so whatever value you set there does not have effect.

 

So why does the most recent DLL perform so well? I cant tell for sure, but I think that it is the result of some heavy code optimization.

 

On 3/29/2020 at 11:38 PM, goaway said:

I should add that I do not have anything in my logs like unknown parameter.  Maybe it's different based on which version of the .dll you are running?

 

Now if OP is saying he is not getting the Unknown config warning then it means he is using one of the old dated versions as the warning condition is implemented in the newer ones.

I would really love to see what kind of performance boost would goaway gain if he used the newest DLL. PM me if you could not find it on the web.

 

EDIT: Typos

Link to comment
8 hours ago, OrrieL said:

 

First we are talking about Skyrim LE, not SE.

I'm not talking about SE either, nor would I; as SMP on SE is an open faced unfinished garbage fire and will remain so unless aers decides to recompile it, and I'm referring to collisions specifically, as any xml can be paired down to essentially nothing for jiggling.

 

The newest dll is also 18/09/18, though the CRC is the same as your exemplary one according to libcrc.

 

The likelihood is Hydro pulled the GPU utilization after buying a new laptop; I remember people complaining that SMP straight up stopped working after a late 2017 update and all of them had AMD equipment on both sides of the rendering cycle and in the same year there was a version that would not work period unless OCL was installed on your system.

 

Also as for parsing, the library parses about a third of the conditions and definitions that are available to it at runtime.

Link to comment
  • 2 weeks later...
41 minutes ago, cooldown1337 said:

Any idea what the <groupEnableMLCP>false</groupEnableMLCP> setting does? It's set to false in BHUNP configs.

From my own experience and from explanation in this thread, this option corrects collision errors on the fly, changing erp value from default 0.2 is not even required.

With this enabled, cloth becomes very stable in LE, almost as SSE plugin with aers fixes.

Guess, on SSE with groupEnableMLCP set to true it might be ultimate experience Skyrims can offer, kek

Link to comment
51 minutes ago, full_inu said:

From my own experience and from explanation in this thread, this option corrects collision errors on the fly, changing erp value from default 0.2 is not even required.

With this enabled, cloth becomes very stable in LE, almost as SSE plugin with aers fixes.

Guess, on SSE with groupEnableMLCP set to true it might be ultimate experience Skyrims can offer, kek

Hm, I wonder if it works on body collisions too - gotta test it sometime then. Hopefully I can achieve The Power of SE on my LE setup too lmao.

Link to comment
6 hours ago, cooldown1337 said:

Hm, I wonder if it works on body collisions too - gotta test it sometime then. Hopefully I can achieve The Power of SE on my LE setup too lmao.

SMP on SSE is an open garbage fire compared to LE, what are you even on about

Link to comment
15 hours ago, 27X said:

SMP on SSE is an open garbage fire compared to LE, what are you even on about

I'm just having a giggle here, he'll understand ;)

Besides, as for these settings: the fps increase from adding the opencl is very minor and I can't confirm it's not just regular deviance (1-2 fps). 

<?xml version="1.0" encoding="utf-8"?>

<configs>
    <opencl>
    <!--warning : not finish yet-->
        <enable>true</enable>
        <platformID>1</platformID>
        <numQueue>1024</numQueue>
    </opencl>
    <solver>
        <numIterations>16</numIterations>
        <groupIterations>32</groupIterations>
        <groupEnableMLCP>true</groupEnableMLCP>
        <erp>0.2</erp>
        <min-fps>60</min-fps>
    </solver>
</configs>

Probably gonna need more testing.

 

Update:

I get

[Thu Apr 16 15:29:13 2020]WARNING: Unknown config : 

In my SMP log with <opencl> section added so it indeed probably does nothing. I'm using the d3d-whatever that comes with BHUNP 1.71 by the way. I suggest you check the logs too if you apply the fix.

Link to comment
On 4/16/2020 at 12:18 PM, cooldown1337 said:

Update:

I get


[Thu Apr 16 15:29:13 2020]WARNING: Unknown config : 

In my SMP log with <opencl> section added so it indeed probably does nothing. I'm using the d3d-whatever that comes with BHUNP 1.71 by the way. I suggest you check the logs too if you apply the fix.

My conclusion is the same as yours if you check my detailed breakdown several posts above. 

Yet there are people here in this thread claiming the config change had a significant impact on their performance. I would really love to see which SMP DLL version and what GPU (vendor) they are using.

 

None of the DLLs that I tried had a condition logic to parse opencl tag in the configs.xml. The older versions do not even have a condition to give a warning message in case unknown parameter was parsed.

 

This does not mean that the DLL they are using is not utilizing the GPU for collision calculations, but there is no way to tell the engine which device to use (opencl tag not implemented).

Link to comment
  • 2 weeks later...

@goaway, you get 10,000 pts ?. Well written, comprehensible and effective. I would never have guessed what those parameters were.

This worked phenomenally well.  I have a NVIDIA GeForce GTX 1060 btw and I always suspected something was missing with its performance. It's far smoother and more realistic feeling--one would say immersive.

Link to comment
  • 4 weeks later...
20 hours ago, leighn said:

This topic is underrated and should be pinned. Thanks for the tweak ! It works witch Special Edition too.

except it doesn't work, except for one version of SMP which was never made for SSE.

Link to comment
4 hours ago, leighn said:

Not completely indee, It just improves the accuracy of the physics for me

Except it doesn't.

 

What improves the accuracy of SMP is proper 1:1 mesh weighting and correctly written controller files, none of which exist in generic formats and all-in-one set ups purported in this thread or anyone's patreon, or any other public venue, nevermind the extant SSE issues or the simple fact only a third of the library's functionality is exposed and has to be configured per unique hardware instance.

Link to comment
  • 7 months later...

Okay, silly question here.

 

Where is the ini (or XML) file for HDT-SMP? And what is the filename? I can't find it in the SKSE folder. It's not in %randompath%\Skyrim Special Edition\Data\SKSE\Plugins\hdtSkinnedMeshConfigs

 

I cannot find it anywhere I looked all around. I am assuming it's an XML file?

Link to comment
  • 2 months later...

For other who will google it.

Bad news for us. Bullet3 runs 100% on CPU.

 

I built Bullet3 myself - all demos don't use GPU (I have AMD with Open Cl support)

 

 

image.png.79c8247ea5644916aae7573650009f5a.png

 

 

Also can add some thoughts why HDT SMP physics in Skyrim mods so slow.

- Mods uses per polygon calculations - its insane idea. Game engines use capsules for all collisions and its lightweight and fast.

- Also Bullets3 seems not very targeted to game usage, they developing precision simulation for robots sponsored by Google, GPU support was sponsored by AMD and was stopped in 2015 (6 years ago!)

 

You can compare how fast runs optimized modern PhysX in Unreal Engine vs Bullet3 demos. 

Maybe I'm wrong, but running CPU based physics in 2021 its failed idea, by default.

 

 

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...