Let’s start of with a warm welcome and thank you for joining us on the topic of optimization and Unity. Since your here it’s safe to assume you have an interest in what can be done to approach optimizing a project within the Unity environment. We have decided to share with everyone what we learned along the way and the various approaches that can be utilized when breaking down your project to squeeze out every last drop of performance juice.
General Disclaimer: It’s best to understand that this information may or may not pertain to your project. The approaches and techniques used throughout are articles may not apply to all scenarios, the information contained will provide a solid foundation for understanding how to attack any project for optimization.
Please reach out to us if you have something to add or think could be worded differently to help others understand and utilize these approaches.
Special Note: While these articles refer to WEBGL, the techniques and approaches for optimizing your project in Unity will general apply to all project types: WEBGL/Mobile/Desktop etc.
We asked ourselves this exact question and after much debate we decided to go with WEBGL for our new title.
Why? The first and most prominent being by using WEBGL it would allow use a greater reach to players since they could simply be using their web browser to play.
Using the browser will also allow our product to be available to play online without the need to download and install anything. That there was the biggest draw to WEBGL, it works right in the browser.
On top of this, a WEBGL product can not only be hosted and served up from our website but also on websites such as Kongregate & itch.io in which provides another platform to get your game out in front of people.
We understood by choosing to go down the route of WEBGL would not be the easiest choice. We knew that one of the most critical aspects of using this platform for our project was optimization.
Optimization can arguably be something every developer should think about. Others wills say just focus on your framework and worry about optimizing things later. While this approach may work well for certain projects, going down this path on others could end up being a rabbit hole of lost time look and searching for ways to optimize your code. We did not allow optimization to become in the “backseat” while we drove forward with coding out project.
WEBGL is less forgiving to those who do not start their project thinking that way. As development continued, it quickly became apparent our main issue would be memory. If we did not get this to a smaller amount it would result in losing a part of the audience due to requirements to play.
WEBGL becomes tempermental when running constantly high on the memory requirement. A lot of sources will say you should try to keep your game around 256-512 MB of ram. This is something we are constantly up against and ever time it means finding a balance between optimization and a decent graphical visualization.
If you want a more in depth information on WEBGL, just click here for more on Unity WEBGL
General Performance Optimization Methods
“Performance optimization is often looked upon as the least fun during development which leads to often being overlooked, however the performance is crucial and can easily make or break the entire user experience.” – AresTheDog
As devices become more and more powerful the need for optimization seems to become less important. As mentioned above, a lot of sources show a conflicted opinion on when to start optimizing you project, Some say don’t worry about it until later or when you need to, while others will say you can save yourself a lot of hassle by planning and coding under best practices and optimization techniques.
We had this in the back of our minds when we started to develop our ideas however something we initially overlooked was the actual impact everything would have on the project’s overall memory allocation.
WEBGL content runs inside a browser which in turn has it’s very own memory space. As our project progressed we were consuming more and more memory.
First it was 128MB, then 256, and continued until reaching 1024GB memory size. Was this too much? We started our research and it was determined that web browsers are constantly evolving along with most personal computers, laptops, netbooks etc., tend to come standard with 8GB ram.
So where this size may have been a problem in the past, both browsers and WEBGL have matured a lot over the years, and worth mentioning is more and more applications are becoming a standard 64bit process. The downside to this memory size is not everyone will be able to play it. This has turned into an eternal-struggle. We still continue to be up against Memory being a priority to bring it down. Less is more in this circumstance.
Finding the right settings for your project builds will come with a lot of trial and error. We have collected the best approaches and techniques all in one place to share with others what could be done to optimize any project.
Once you have moved into the stage where your ready to start creating a build and testing your WEBGL product, we hope you will find a lot of this information as useful as we did.
When starting out your project build for a WEBGL product, one of first and most easiest things for quick and easy optimization is to enable “Strip Engine Code”. By enabling this you will not include any of the engine code that’s unused in your build.
It’s worth mentioning also that you should not use the “Development Build” option since the size of program will be huge and not be compressed or minified, sometimes though you will need to do this, for example, Deep Profiling. More on profiling later, however it’s best not to check that option unless you need to.
Top Tips From Unity Doc’s when building for WEBGL that impacts performance:
- Enable “Strip Engine Code” in Player Settings.
- Specify the “Crunch” texture compression format for all your compressed textures in the Texture Importer.
- Don’t deploy Development builds, they are not compressed or Minified and so have much larger file sizes.
- Set Optimization Level to “Fastest”.
- Set “Enable Exception” in Player Settings to “None” if you don’t need exceptions in your build.
- Take care when using third-party managed dlls, as those may drag in a lot of dependencies and increase the emitted code size significantly.
Static & Dynamic Batching
This is another area that we achieved a significant performance increase, simply by using “Static Batching”. By using this approach you will quickly see a performance boost.
What is “Static Batching?”, think of it like this: When ever you render a sprite it causes a draw call, many sprites equals many draw calls. This is the same for objects. Whenever an object is rendered it will product a “Draw Call”. Unity will often render multiple “Draw Calls” which are overlaid on top of each other. Since each one causes CPU overhead by producing a “Draw Call”.
This is where “Static Batching” comes in. It’s like using an atlas for your sprites. By turning on “Static Batching”, we can make sure there are no wasted “Draw Calls”. Even though “Draw calls” on modern hardware are very cheap, it’s good to keep in mind what makes them expensive.
What’s important to know about the rendering done by Unity is that changes related to how the GPU is configured to render an object is where the highest cost comes in.
In any given project it’s normal for an object to have a material. Then that material will consist of a texture and shader which means the renderer is now required to change its state to draw each one and object. This is how and when “Draw Calls” become expensive and take longer and longer to execute. Since each rendering pass count’s as 1 “Draw Call”.
One “Draw Call” to render the object, another “Draw Call” for the texture and then another for the shader, already 3 calls per object in which if you have a hundred objects, that’s over 3000 “Draw Calls“.
Armed with this knowledge we know what a “Draw Call” is that objects are rendered in batches we can begin to start to build on optimizing or project.
The purpose of batching is rather than rendering one object then it’s texture and then its shader, we can batch the calls together.
Batching objects together will minimize the amount of state changes requires by the renderer to draw each object. This will lead to a big boost performance since it is reducing the CPU overhead by rendering multiple objects during each pass, rather then 1 at a time.
Batching does require that in order to accomplish this all objects are required to share the same material, textures and shader to allow it to be rendered in 1 pass rather then 1 pass for each object. That being said, that is the main requirement for batching. Lowering “Draw Calls” and CPU usage for rendering.
The only hang up being only objects that share properties like the same materials, textures and shaders can be batched together.
- Dynamic batching: for small enough Meshes, this transforms their vertices on the CPU, groups many similar vertices together, and draws them all in one go.
- Static batching: combines static (not moving) GameObjects into big Meshes, and renders them in a faster way
When your considering to which type of batching to use, keep in mind that for effectiveness and in order to get the biggest performance boost out of “Static Batching” is to have as few different materials as possible.
One way to achieve this would be to combine all your materials in one big texture. To start using “Static Batching”, the Unity Editor requires that you to set the flag “Static” in the inspector on the object properties.
“Static Batching” essentially bakes all your models into one huge mesh. So if you have a house will multiple walls or a fence with multiple pieces since they are all using the same material this would be a prime candidate for “Static Batching” since all can be combined and rendered as one object which will in turn reduce your “Draw Calls” and boost performance.
It’s important keep in mind that each material which is different will generate a draw call since it will be required to be rendered individually. It’s also worth mentioning that this process does come with it’s own performance overhead since Unity will rebuild indices every time a renderer leaves or enters the frustum.
Reducing the amount of unique materials and triangles will dramatically improve your rendering performance but in turn will also increases the memory usage.
To summarize Static Batching:
- Objects are not required to share the same material, however less is more.
- Objects that use multiple materials require a texture atlas.
- Objects will be combined to create one mesh.
- Best benefits on careful planning of which objects will be combined and batched together.
- Uses a lot more memory and not best suited for situations that need lower memory usage.
When your objects are moveable or do not stay in one place, “Static Batching” is out of the question. This is where “Dynamic Batching” saves the day.
“Dynamic Batching” is best suited for objects that will be moved and have physics along with a rigidbody.
A key point with “Dynamic Batching” is that it will only apply when meshes contain less than 900 vertex attributes overall.
It’s also important to know that you should not scale objects since they will not batch either.
Rule of thumb: Uniformly scaled objects will not batch with non-uniformly scaled objects.
Another thing to keep in mind is that even though an object has an animation, if there are any parts of that object that never move, you can actually mark those parts as static and it will not interfere with the animation.
“Dynamic Batching” is not without it’s own CPU overhead either. It can lead to performance issues when used in the wrong way or used way too much.
To summarize Dynamic Batching:
- Objects are required to be the exact same scale to batch.
- Objects are required to share the exact same materials.
- Objects are required to be rendered together and in order.
- Objects are required to be below the vertex limit
– 180 vertices for a shader using vertex position, normal, UV0, UV1 and tangents.
– 300 vertices for a shader using vertex position, normal and a single UV.
Key Batching Points To Remeber
- Batching dynamic GameObjects has certain overhead per vertex, so batching is applied only to Meshes containing fewer than 900 vertex attributes in total. If your Shader is using Vertex Position, Normal, and single UV, then you can batch up to 300 verts, If your Shader is using Vertex Position, Normal, UV0, UV1, and Tangent, then you can only batch 180 verts.
- GameObjects are not batched if they contain mirroring on the transform (for example GameObject A with +1 scale and GameObject B with -1 scale cannot be batched together).
- Using different Material instances causes GameObjects not to batch together, even if they are essentially the same. The exception is shadow caster rendering.
- GameObjects with lightmaps have additional renderer parameters: lightmap index and offset/scale into the lightmap. Generally, dynamic lightmapped GameObjects should point to exactly the same lightmap location to be batched.
- Multi-pass Shaders break batching.
- Almost all Unity Shaders support several Lights in forward rendering, effectively doing additional passes for them. The draw calls for “additional per-pixel lights” are not batched.
- The Legacy Deferred (light pre-pass) rendering path has dynamic batching disabled because it has to draw GameObjects twice.
- Not everything will be batched. Things like skinned Meshes, Cloth, and other types of rendering components are not batched
Note: Even though “Draw Calls” can cause spikes which can essentially become a bottleneck, always remember that it’s the actual frame rate that really matters.
Note: Consider that if your overall frame rate is fine, than there really is no need to worry about draw calls since there are no performance issues.
Note: The number of draw calls required to impact performance has a heavy dependence on the device’s hardware that is running your project along with everything else that is going on in your scene. This is why knowing your target audience’s hardware is essential.
Do you want more information? Be sure to check this Unity Blog. It is an excellent article on troubleshooting when your batching breaks and what you can do about it. It also has a sample project along with more in-depth information. It’s worth checking out. Want to know more about rendering in Unity, check this post.
Frustum culling is an easy way that Unity uses to automatically improve your projects performance, and as stated, the best part is Unity does this by default. The problem however is when you only rely on this method without any other culling for objects that are not viewable by the camera, since they still will still be rendered even though you cannot see them.
You can actually see Unity at work doing “Frustum Culling” by viewing the profiler and you can watch as the draw calls go down when objects are no longer in view and no longer required to be rendered.
To see an example of “Frustum Culling” in action check out the this GIF which is showing what Frustum Culling actually looks like
As from the Unity Doc’s: “Occlusion Culling is a feature that disables rendering of objects when they are not currently seen by the camera because they are obscured (occluded) by other objects.”
When handling “Draw Calls”, “Occlusion Culling” can be a source of performance gain. You may have objects that while they are hidden behind other objects and cannot be seen by the camera they are still being rendered and therefore producing “Draw Calls”. This is where “Occlusion Culling” comes into play.
“Occlusion Culling” is different from “Frustum Culling” where as “Frustum Culling” will only disable objects from being rendered that are outside of the camera’s viewing area and it does not disable objects that are hidden from view by other objects. You can use both, “Occlusion Culling” and “Frustum Culling”.
To get started with “Occlusion Culling you first need to break up the geometry on your scene into sensibly sizes pieces. The basic idea behind this is since each individual mesh will be turned on or off based on the occlusion data, you would not want a piece that would contain all the objects in a room since either none or all of the objects would be called since they are part of the same piece.
Cases like this would make more sense to have each object as own mesh, so each object can individually be culled based on the camera’s view point.
Whenever you have a game object that is blocking the view of other game objects when your looking where to start this is the best spot to tell Unity not to render objects that are hidden by using specific parameters that you specify within the occlusion culling window.
That way, Unity will only render the objects that the camera has a direct line of sight to as there is no reason to waste overhead on rendering object’s that cannot be seen.
When you need to apply “Occlusion Culling” to moving objects you will need to to create an Occlusion Area and then modify its size to fit the space where the moving objects will be located
Not sure where to start? In the Scene view, you can swap your camera to “Overdraw” mode,. This essentially will make every renderer in the scene semi transparent.
When looking for areas that require attention, look for the bright hotspots which will show you where objects are overlapping which will allow you to determine whether or not culling Is required.
Quoted from Unity: ” It is vital to understand that there is no one size fits all approach to improving rendering performance. Rendering performance is affected by many factors within our game and is also highly dependent on the hardware and operating system that our game runs on. The most important thing to remember is that we solve performance problems by investigating, experimenting and rigorously profiling the results of our experiments.”
NOTE: Keep in mind, Unity by default will apply occlusion culling to the whole scene if you do not create any occlusion areas.
Level Of Detail (LOD)
The area of LOD’s is another common rendering optimization technique. Objects nearest to the player are rendered in full using detailed meshes and textures.
Distant objects with an LOD will use less detailed meshes and textures. This will reduce the number of vertices that the GPU has to render while not affecting the visual quality of the game.
We use LOD’s on pretty much everything in the scene that could be in the field of view of the player. This provides a huge boost to performance but can take a while to setup.
More on LOD here at UnityDoc’s.
Diving further down the rabbit hole of optimization and culling we come across this API.
The CullingGroup API allows us to hook into Unity’s LOD system to optimize our project. The Unity Doc’s for the CullingGroup API contains several examples of how this might be used in our game. As ever, we should test, profile and find the right solution for our game.
As per then Unity Doc’s, Common Purposes of this are:
- Simulating a crowd of people, while only having full GameObjects for the characters that are actually visible right now
- Building a GPU particle system driven by Graphics.DrawProcedural, but skipping rendering particle systems that are behind a wall
- Tracking which spawn points are hidden from the camera in order to spawn enemies without the player seeing them ‘pop’ into view
- Switching characters from full-quality animation and AI calculations when close, to lower-quality cheaper behaviour at a distance
- Having 10,000 marker points in your scene and efficiently finding out when the player gets within 1m of any of them
Note: There are no components or visual tools for working with CullingGroups; they are purely accessible via script.
This is a relatively new approach which can provide a significant performance increase.
Normally you cannot use instancing for the SkinnedMeshRender (Which is generally used on player characters/npc/mobs).
When you have a lot of characters you will therefore have each of them accompanied by having a SkinnedMeshRender. This results in massive draw calls and animation calculations.
There is an interesting approach at this Link here on: Animation Instancing which talks about an approach to reduce the overall impact load on the CPU and supplement GPU instancing within Unity. GPU instancing is now becoming a new way perform tasks effectively.
As stated in the article, be aware this is relatively new and experimental solution. This method defiantly provides a boost for those types of games that have multiple characters.
We have adapted this technique to allow us to have huge Zombie hordes running in a scene without loosing precious frame rates.
This concludes part 1 of performance increasing approaches that we found to be extremely worth mentioning due to the impact it had on our development process.
To continue on with Optimization Techniques and Approaches, check out Part 2.