Wednesday, December 5, 2018

Fast distance calculations

I may as well put this into it's own little post...

One of the things that I've been doing with scripts in Unity is finding ways to disable objects based on distance. Naturally LODGroups are meant for that, but I'm referring to things such as lights, particles, and other things that don't have a mesh renderer component.

As I mentioned in the previous post, while something like "Vector3.Distance(a, b)" is convenient, it is also quite intensive. So when dealing with potentially hundreds of objects every frame, it can add quite a lot to the CPU overhead.

The solution is to use a fast distance calculation:
public static float FastDist2D(Vector3 a, Vector3 b)
{
    Vector2 heading;

    heading.x = a.x - b.x;
    heading.y = a.z - b.z;

    return heading.x * heading.x + heading.y * heading.y; }

Since the action in most games takes place horizontally, it makes sense to simply use the X and Z coordinates without involving any square roots.

And an all encompassing binary distance check would look like this:
public static bool FastDistCheck2D(Vector3 a, Vector3 b, float d)
{
    float distSqr;
    Vector2 heading;

    heading.x = a.x - b.x;
    heading.y = a.z - b.z;

    distSqr = heading.x * heading.x + heading.y * heading.y;

    if(distSqr < d * d)
        return true;

    else
        return false;
}

Clean and simple!
Calculating in 3D is just as easy of course, simply make "heading" a vector 3, and add the "y" component.

Wednesday, October 17, 2018

Lessons in art optimizaion

In the last several years, my spread of platforms I use for Unity covers slow low end mobiles phones and laptops (like the venerable Samsung S4 and dual core Intel PCs), to high end consoles and desktops. And it's been very interesting to see how each generation of Unity provided various levels of performance across these different platforms.

While something like the Samsung S4 (from 2013) isn't fast by any means, it sometimes surprised me with how slow it is compared to an old dual core laptop from 2011, and a relatively ancient Intel Q6600 (One of the first quad cores they released in 2007). But still, having access to low end platforms is excellent when looking at the nitty gritty of performance optimization.

In general there are basically two flavours of optimization: Things that affect the CPU and things that affect the GPU. There is usually cross talk between both, but it's good to know what will affect one or the other.

CPU

A quick list of some of the CPU vampires
  • Physics (At least for now with PhysX 3.3, probably)
  • Scripts (old-skool Mono based)
  • Particles
  • Terrain
  • Object counts
Physics is naturally one of those things that can eat a CPU alive, and I won't go into the do's and don'ts of the basics, since I'm assuming everyone knows what is "Too Much" based on their own design. But of course there may be areas or things that are overlooked which can cause performance issues.
  • Make sure you're using primitive colliders where possible, that they're static, and you're telling Unity to bake down your physics at build time.
  • Make sure you're using Unity's layer system to isolate your system as much as possible. Especially expensive systems. No need to everything to be tested against everything else.
  • If you have AI with location based damage (arms, legs, head, torse, etc), are you applying the previous rule? Or are all those individual colliders being tested every frame against everything?
  • Better yet, shouldn't all those dynamic colliders be turned off most of the time anyways?
  • Lots of AI moving around on a nav mesh, avoiding one another, can be a big hit.
  • And of course other stuff that has been documented online by Unity and forum members.
It was difficult to optimize physics in Unity 4.x because even though an object was labeled static, the collider wasn't. It was only in Unity 5.x that they fixed things and gave you the option to bake things down, vastly improving performance when a lot of colliders were around (like trees on terrain).

Scripting can also have obvious pit falls. While the topic is as broad and deep as the seven seas, I just want to focus on a few specific points that I personally encountered.
  1. A script counts as a game object, and depending on the platform and the scope of the game,  this can cause issues. In an open world game, for example, it's important to keep an eye on how many active objects there are at any given time, or how many scripts these objects are using.

    Is there a map system that looks for in game objects, and then uses their positions to draw an icon? How many objects are there? Dozens? Hundreds? Do they all need to be active all the time?
  2. If the scripts call the update function, can each script register itself with a manager, and the manager calls the update manually?

    If there is a lot of logic branching in the Update function, have the most likely out come happen first. It may seem like a minor optimization, but again, when dealing with thousands of scripts, it can add up!
  3. Cache those references at the beginning! If you know you're gonna be accessing a particular component or script all the time, stuff it into a variable during Awake() or Start().
  4. Also depending on the platform or scenario, built in math functions can heavy.

    Vector3.Distance(a, b) is nice an convenient
    but...
    Vector3 heading = new Vector3(vecA.x - vecB.x, vecA.y - vecB.y, vecA.z - vecB.z);
    dist = heading.x * heading.x + heading.y * heading.y + heading.z * heading.z;

    ... is probably faster, especially if you're calculating in 2D using only "x" and "z".

    Use manual squares, like the example above, instead of Pow() or Sqrt(), etc.
Particles are a wonderful thing with newer versions of Unity! And they're also one of the few systems that are brilliantly multi-threaded too. Of course it's important to know the sub-components of particles because some are more costly than others. So what are some of the best ways to optimize them?
  • If you need lots of particles, consider splitting your emitters into how ever many cores you have. Need 10000 particles and you're using a 4-core CPU? Use 4 instances of the same emitter doing 2500 particles each.
  • Noise can make particles look great, but it's HEAVY, even on PCs. Don't bother with noise on mobile platforms unless you've got plenty of CPU to spare (Meaning a super simple scene, almost bare-bones scene, or low particle count). Low (1D) and Medium (2D) noise is ok for consoles, but save 3D noise for higher end desktops. Even with the multiple emitter trick, you'll be paying a high price.
  • Same with collision. If you need collision on particles on a low end platform, low quality collision and multiple emitters are about your only hope of keeping performance in check.
  • Trails, you'll be paying a rendering cost (vertices), not so much an outright computational cost like with collision and noise. But this only applies if you're rendering super smooth trails with verts really close together.
  • The more gradients are being used, the more processing times is being spent. Unity's documentation provides good breakdowns about this.
  • And finally, if a particle is not emitting, turn the game object OFF. It's surprising how much CPU time can be eaten up just by running through idle particle systems.
Terrain. Unsurprisingly, the terrain system is in dire need of an overhaul, since as of this writing, the current terrain system in Unity 2018.2.11 is a left over from Unity 2.x days, with few updates in between. And while there are third-party solutions, there are still things to consider if you need to use the existing out of the box system.
  • Make the terrain as smooth as possible in out-of-bounds areas, and when it's hidden under objects. This will prevent it from sub-dividing and causing more drawcalls than necessary. This can shave milliseconds off the CPU if you have large or multiple terrains.
  •  Crank the "Detail resolution per patch" value as high as it'll go before you start running out of verts. Even if you have no details on your terrain, the extra overhead needed to compute all the detail patches will eat into the CPU,  especially on low end systems.
  • Use the "Pixel Error" value to lower the terrains "LOD" and mesh detail. The fewer chunks of the terrain you draw, the better. This may not be as big of a deal in the future, but for now, it helps a lot! The higher this value, the chunkier the terrain, but the reduction in draw calls could be worth it.
  • The texture system uses 4 texture per pass. 1 texture is as expensive to draw as 4 textures. If you need 5 textures you may as well go all the way up to 8, because either way you'll be rendering the terrain in 2 passes.
While Unity is updating the terrain system internally, it's not always clear when the bulk of the changes will actually be implemented. Unity 2018.3 will have GPU based terrain computation, which apparently speed things up considerably. (At some point I will be examining the betas and doing some apples to apples comparison)

Object counts, this is something that might be overlooked in a lot of projects, especially ones with huge open world style levels. And when I say "objects" I'm not referring to only game object.

If there is a large rock asset with 4 LODs, the game will count it as at least 9 objects in memory: The LOD group component, Mesh filter and Renderer on LOD0 and the other 3 LODs as well. Now multiply this asset hundreds of times, and it quickly adds up. UI canvases and individual components, audio, all those little helper scripts I mentioned earlier. It's death by a thousands cuts.

Even if the game has well optimized art and scripts, high total object counts could still pose a performance hit, depending on the platform. And unfortunately there isn't a quick and easy fix for this, other than explicitly turning off (or even unloading) anything that isn't being used or seen.

Cutting scenes up into smaller chunks; turn an area of high-fidelity art consisting of multiple objects into a low-fidelity "vista" version made up of a single mesh. Either as a simple art swap, or if you want to get really fancy, a high-detail scene the player interacts with, and a low-detail vista scene.

Using scenes this way is a fairly advanced technique, but it can cut object counts and drawcalls by a LOT. Not to mention improving load times.

    Wednesday, October 10, 2018

    Been too long! New updates soon.

    Been too long since I posted, but I've been working, taking care of a growing family, and most importantly learning the ins and outs of Unity. From my humble beginnings in a late release of Unity 3.x, to the bleeding edge of Unity 2018.3 beta.

    I'm hoping to post some new content about some of the things I've learned with regards to optimization and art creation from a practical stand point. Things beyond "Have you tried static batching?" "How about occlusion culling? It helps!" And why it doesn't always actually help.

    Look for these articles and discussions soon, like within a week or two!