Category: VRscosity

3D Representation of 16-bit RGB space.

VRscosity Dev Blog #5: Fail faster

In design of all forms, there is the beautiful mantra of “fail faster.” The idea is essentially that you will never get any idea right in the first try – and ultimately, boiling something down to a high level idea, trying it, and seeing what works and doesn’t is the best way to learn.

And you should do this. Quickly.

Extra Credits (you really should follow these guys if you’re interested in game development) covers this very well, better than I can by far.

Tweaks and Balances

I’ve not written a dev blog in a long time simply because there hasn’t been much to comment on. After my last post about how I’m using raycasts to cull the number of chunks being rendered, my progress has been little more than tweaking. Specifically:

  • Moving the mesh generation about such that each chunk handles its own mesh, so updating individual chunks with the simulation will be easier.
  • Adjusting material use such that I can use texture atlas based systems to avoid too many draw calls.
  • Building a really annoying set of lookup tables to ensure the simulation works correctly without too much mathematics involved.
  • General refactoring and tidy up, including arranging the code into a decent set of namespaces.
  • Adding saving and loading functionality for when I eventually get to this point.
  • And of course, further testing in the Vive to ensure everything looks how it should.

But most importantly, I’ve been failing.

 

Slow Loads

Here is an example for you.

So the largest issue with my current culling system is that it relies on the automated Mesh Collider generation Unity offers. For situations wherein a model must have mesh-perfect collisions, they are a fast and effective way to generate a collider for that very purpose.

But fast is relative. Compared to having to manually calculate, model or generate a collision surface, it is very fast. But logically? Generating a Mesh Collider is slow.

So very slow.

Bad Physics generation
Generating colliders for 16³ chunks. This is miserable. Very miserable.

The result of this is that during stress tests, generating a 128³ Voxel grid in this way takes roughly 40 seconds. Generating a 256³ Voxel grid takes 400 seconds. This is, genuinely, the biggest limiter to the fidelity of the simulation – the load time of the scene.

The worst part of it is that these things are not thread safe – none of Unity’s main engine features are – so the mesh for each of the 512 chunks in the 128³ grid has to be calculated one by one. All I can do to mitigate this is reduce the size of the chunks to 8³ instead of 16³, which achieves no gain in load time, but does prevent the scene from reducing to 15fps (each generated mesh is smaller so takes less time in a frame) in return for four times as many batches drawn.

So of course I am looking for a solution, preferably one that doesn’t require physics. Unfortunately, due to how Unity’s ray casting works, no physics means no raycasts… which is a problem.

 

The Fast Fail, The Pride Cube

Ultimately what we’re looking for in the culling algorithm is what the camera sees and nothing else. If you boil this down further, you’re talking about exactly what is drawn on screen when you look in a given direction. So, essentially, pixels and their colours.

Enter what I have dubbed The Pride Cube.

3D Representation of 16-bit RGB space.
3D Representation of 16-bit RGB space.

The idea is simple: Assign each chunk its own RGB value as an ID in a sub shader, create a secondary render target at a lower resolution and output the result you see above to that render target. Iterate through the pixels and if a colour is seen, set that chunk as visible to the main render target. Good idea in theory…

…but in practice, not so simple. Multiple Render Targets as an implementation in Unity has been supported for a while, but the supplied documentation is obtuse and borderline unusable. On top of this, if you actually get the system working, you have to rely on black magic for it to work properly. Not reliable with VR applications. Yet at least (though Unity does support up to 8 render targets for hardware that supports it).

And once you get past that problem, which is certainly doable, you have the issue of meshes. You can hide a mesh from view of one camera source, but it will still be stored in the buffer for the secondary render target, nullifying the point of culling in the first place. It’s a tricky one for dynamic meshes.

This failure took me roughly 5 hours to prototype, test and scrap. It sounds like a lot of time – more than half a 9 hour work day – but compared to the overall time scale of the project this was a fast – and needed – fail. Now I know this system won’t work and I must consider other options.

But first I will be optimising the mesh generation to lower the number of polygons the physics collider is to handle when possible, and making sure the simulation actually works.

VRscosity Dev Blog #4: Lasers of culling

After a data structure is decided upon for a Voxel system, you run into the problems of rendering. As mentioned in a previous blog post exploring a screw up of mine, the crux of my solution is this:

“In brief, the final render system works similar to how other such systems do. A chunk is generated by building meshes by vertices and then once the desired size has been reached, a new chunk is started, repeat. Any Voxels that don’t touch an empty space aren’t rendered, essentially culling a lot of the work that the extra meshes would cause the graphics card. Chunking in itself also lowers the amount of draw calls made, resulting in (generally) better performance all around.”

Generating mesh is only one portion of the overall challenge. The second is that of effectively optimising what is displayed so that the system runs as fast as possible. Why bother displaying chunks that can’t be seen by the user? The answer to that is simple: Culling.

 

Frustum Culling

The less meshes the graphics card has to draw, the less overall drawcalls made, the less vertices displayed and the faster everything runs. Unity’s rendering engine has the wonderful function of camera frustum culling – if it’s not visible by the camera’s field of view, it’s unloaded from the graphics pipeline. In most situations, you can combine this culling with Unity’s own baked occlusion mapping to save drawcalls made creating geometry the player cannot see at that position (usually because said geometry is behind another). Great! But…

 

Unity is Not Enough!

The issue is that we’re dealing with procedural geometry that is generated upon the loading of the program. Unity’s occlusion system is solid, but as hinted before, is baked: It relies on the meshes being there already so that an occlusion map can be built and stored in the editor. When things are dynamically generated, this is impossible with the provided tools. So a workaround must be constructed.

 

Chunks to the Rescue

By rendering in chunks, you not only manage to avoid generating too large a mesh, but you also allow for culling of entire areas of the world that wouldn’t be visible to the camera. The issue is how to cull effectively and with minimal overhead.

 

Camera.ViewportPointToRay

The easiest solution by far is to draw lines between the camera and the chunks, and removing any chunks that cannot see the camera, or vica-versa. After various attempts, the latter became the smarter option, hence Camera.ViewportPointToRay(), a wonderful function that allows you to draw a virtual line from a given point in the viewport (best described as the camera’s field of view) out from the camera.

The concept is that any chunk a camera ray hits will be rendered, all others will be hidden. Rays won’t travel through chunks due to the reliance on physics colliders, and chunk mesh shape is unimportant assuming the automatically generated collision mesh doesn’t fill any wanted holes. In a perfect world, the entire viewport would fire out rays every frame and cull anything invisible.

This is not a perfect world. The overhead of tracing rays (many thousands per frame), and handling collisions for such a scenario is immense. It’s untenable in any situation, let alone a real-time application aiming for 90 frames a second!

Performance drops every 0.2s with that method, down to 15fps!

Scanlines!

in order to save this overhead, I took the approach of adding a “gaze” timer to each chunk, and scanning across the viewpoint in vertical scanlines. Whenever a chunk is hit by a raycast line, it is set as visible and starts to count down. If the chunk is not hit by another raycast within the time limit (currently 5 seconds in the prototype), it disappears. If it is, the timer resets. Scan from both sides in both directions and you end up with a laser butterfly of cheap culling:

By handling only a couple hundred points rather than several thousand, the overhead is vastly reduced. Even with two cameras, as is the standard for VR, we’re talking huge gains of performance:

Scanline approach is much faster – peaks of 4ms (250fps!)

Obviously the method will require further tweaking – cameras hooked to a VR headset aren’t exactly a stable scenario versus a still camera being dragged around. But it’s a start, and one that can be built upon.

Next step: Nailing down the simulation!

Oops

VRscosity Dev Blog #3: Mind your loops

In an upcoming DevBlog I’m going to be talking at length about the steps taken to generate the graphical side of VRscosity. It’s relatively boilerplate stuff – Voxel engines have effectively been long solved, so it’s simply a case of implementing something I’ve done a couple of times before.

…Which is why a particular prototyping bug I had drove me up the wall.

 

The Symptoms

In brief, the final render system works similar to how other such systems do. A chunk is generated by building meshes by vertices and then once the desired size has been reached, a new chunk is started, repeat. Any Voxels that don’t touch an empty space aren’t rendered, essentially culling a lot of the work that the extra meshes would cause the graphics card. Chunking in itself also lowers the amount of draw calls made, resulting in (generally) better performance all around.

The issue I was having was one of bloat. As each chunk was being generated, the amount of vertices required was exploding exponentially. By the time the final chunks were being drawn, they were trying to add too many vertices to the mesh (which, in Unity, is the hard 16 bit limit of 65536) and failing. Some of the chunks were reporting vertex collections close to 100,000 strong, which shouldn’t be anywhere near the case in a system which relies on each side being no more than 1024!

Oops
Holes in the mesh? This isn’t right.

The Source

I threw myself at this problem for more hours than I’d honestly like to admit, trying to work out a solution. Ultimately, after relying on console printouts to show me anything at all, I realised it was my own damn stupidity.

One of the issues with building 3D Voxel environments is that you essentially, somewhere, have to rely on six nested loops. Three for each direction (X,Y,Z) of chunk, and three for each direction of Voxel within the chunk. The mechanism is basically this:

//Setup chunkX/Y/Z here, and the expected chunkCount, then:
  while (chunkY < chunkCount) {
    while (chunkX < chunkCount) {
      while (chunkZ < chunkCount) {
        
        //Setup Voxel vertex factory, then do this:

        int y = Math.Max(chunkY * chunkEdgeSize, 0);
        int x = Math.Max(chunkX * chunkEdgeSize, 0);
        int z = Math.Max(chunkZ * chunkEdgeSize, 0);

        while (y < ((chunkY * chunkEdgeSize) + chunkEdgeSize)) {
          while (x < ((chunkX * chunkEdgeSize) + chunkEdgeSize)) {
            while (z < ((chunkZ * chunkEdgeSize) + chunkEdgeSize)) { 
  
              //Generate vertex here
              z++;
            }
              
            z = 0; //Oops! - Read below
            x++;
          }
          x = 0; //Oops! - Read below
          y++;
        }
        y = 0; //Oops! - Read below
        //Update chunk mesh and apply material
        chunkZ++;
      }
      chunkZ = 0;
      chunkX++;
    }
    chunkX = 0;
    chunkY++;
  }

This code is a mess, but it works and as a prototype that was the goal. It’s simple enough, but for some reason was causing overdraw and I had no idea why – especially when the internal mechanism for drawing the vertices was working perfectly.

It may be lack of sleep, but eventually it clicked, and I’m a moron.

 

The Cure

int y = Math.Max(chunkY * chunkEdgeSize, 0);
int x = Math.Max(chunkX * chunkEdgeSize, 0);
int z = Math.Max(chunkZ * chunkEdgeSize, 0);

So this section is designed to ensure that chunks only started drawing vertices at the current chunk location. It’s called at the start of a new chunk with the idea that the first vertex considered to be rendered for the second chunk, assuming chunks are 8 Voxels across, would be for the ninth Voxel.

But in a moment of stupidity that took me far too long to notice, I was zeroing the variables every loop, instead of calling the above! This resulted in a situation wherein the earlier chunks would “work” fine, but every loop the Voxels would be displayed from the very first one, every time, rather than from the chunk location. This resulted in both overdraw (as vertices were being placed in the same location multiple times), and come larger world sizes, massive amounts of vertices.

Basically the moral of the story is twofold: If you have a problem, come back the next day, work on something else in the mean time. And secondly. check your damn loop conditions.

More on Tetrino.com

VRscosity Dev Blog #2: 500 million bees

There is an old saying that a single bee sting hurts, but a thousand kills. If you ignore the reality of this idea (there are plenty of other factors that affect your reaction to bee stings) the premise is that enough of any small thing can eventually overwhelm.

In software, the developer is constantly wrestling with various factors. In game development, chief among them is performance. Memory and CPU usage has to be kept as low as possible for any given function and writing cheap, efficient code is paramount.

However, some things are simply unavoidable.

 

The Anatomy of a Voxel

There are many ways to build a Voxel system. Minecraft and many of their ilk rely on chunk-based systems and trees to build their dynamic worlds, wherein as the player loads in or expands the world, new areas are generated based on an algorithm in blocks. I mentioned before how VRscosity is not Minecraft, but the latter is a well optimised example of a Voxel system in action.

In Minecraft’s specific case, each chunk is 16x16x256, or roughly 16 bits of block registration. Each chunk is in itself separated into 16 render zones, and are updated based on this separation. Hard numbers for how much memory a single block in Minecraft requires are hard to come by, but it’s somewhere between 1.5 to 3 bytes per block pre-compression.

So at Minecraft’s maximum default loading configuration (which loads 25×25 chunks in memory, storing the rest on disk) is roughly 120mb of RAM for 3 Bytes/block. In practice it’s actually more, thanks to overhead from other concerns such as object referencing (8bytes per reference in 64bit systems) and entity loading, but as a pure system it’s reasonably compact.

 

The Issues of Exponential Growth

Unfortunately, any system that deals with squared numbers won’t stay small for long, and any addition to the world size dramatically increases the amount of memory used.

As with the given example above, just dealing with block usage of 3 Bytes each (so no referencing overhead or compression), loading a 33×33 section of chunks in Minecraft takes 204MB of RAM, but loading a 65×65 section of chunks (the maximum Minecraft allows) takes 792MB!

Suddenly a single bee (block) becomes a deadly swarm of memory usage.

 

VRscosity Voxels

Taking the above problem to heart, the fact that I am planning the maximum sandbox size in VRscosity to be 1024²*512 (1024*1024*512) is showing the problems involved with handling so many objects. Like Minecraft’s top estimate, currently each one of the Voxels in VRscosity takes 3 bytes. On paper, a 512³ box of Voxels should take around 384MB of RAM, whereas a 1024²*512 box takes 1536!

 

Memory Issues

It is a large jump, but not actually the biggest problem to consider. Without any measures to mitigate loading, there is the issue that a full box of voxels is a collection of just shy of 537 million individual objects. Storing and accessing these objects is a complicated dance to learn!

The first large problem is referencing. In the worst case, which is a 64bit system, any given memory reference requires 8 bytes of memory to hold in itself. If I were to reference each Voxel instance, I’d have to add  double the size of each Voxel to the memory cost calculation. This results in the ram usage for the given max size jumping from 1536MB straight to 5632 – almost 4GB purely used for references! Obviously, this isn’t tenable.

To really show the kind of scale I’m dealing with here, moving the Voxel from 4 bytes to 3 saved over half a gig of memory on its own…

One bee is fine. Millions are not!

 

Early Bird Solution

The solution I came up with was to scrap any pretence of building a Voxel as a class. While there are advantages to holding a reference pointer to a class – if only for mutability if nothing else – when making so many repeated class references the penalty is too high.

Thankfully, C# supports Structs.

Structs, for those who are unaware, are essentially data structures – collections of variables with, ideally, no other functionality other than access. They have no overhead associated with them, so their footprint is as large as the contents (a 3 Byte Voxel is a 3 Byte struct, but an 11 byte referenced class), and creation (as well as access) is very fast.

The disadvantage is that Structs are immutable. The situation that gives them their speed (the stack allocation without the heap) means that they cannot be directly modified without pulling some tricks that remove that advantage in the first place. Any changes I wish to make would require the creation of a new Voxel and then overwriting the old one. Thankfully, this in itself is incredibly quick.

So for now, the model is literally a 3D Array of Structs. It’s not pretty, and can be further refined (there is no need to have an “air” object really, as functionally air is empty space). It is, however, the best interim solution while I hammer out other problems. With compression, it may not even be that bad at the end of the day.

That is a future blog though!

More on Tetrino.com

VRscosity Dev Blog #1: Not Minecraft

Many years ago, when the internet was only just maturing into what we know of today, I would spend many hours fluttering between various free games as they came up. Be it Runescape, some cheap thing from Kongregate or Albino Black Sheep, as was typical of youth I found ways to waste my time for as little money as possible.

One of the constant little toys that I’d continuously return to was that of the Sandbox variety. Long before the age of Minecraft, there were plenty of flash and java games available that on a quick glance were little more than embedded versions of MS Paint. However, unlike said drawing program, these toys contained various different materials that could be placed, which would interact with each other.

More advanced versions included acidity, electricity, temperature changes and air pressure, most notably the Dan-Ball’s Powder Game, which has long set the staple for these virtual 2D sandboxes. Eventually this was surpassed by The Powder Toy, which as a desktop client allowed further simulation to the point of being able to set up microcomputers and sensor networks.

So I was wondering, what about if I take these principles, and apply it to voxels?

 

Introducing: VRscosity!

A voxel based sandbox featuring fluid dynamics.

For the HTC Vive.

Because I don’t hate myself enough already.

VRscosity is to be exactly as tagged. Taking inspiration from the aforementioned sandbox games, I am building a voxel-based toy that lets the player pile, shape and sculpt various materials as they see fit, including liquids, solids and gasses.

The idea is that there will be some rudimentary fluid dynamics simulation within the sandbox. The goal of this is not to be realistic, but rather give some fundamental considerations to deal with – specifically, pressure, weight, mass and viscosity. A wall built with sand won’t hold back as much water as a wall built with stone, for example, before the pressure pushes it over.

Why am I building this for the Vive (and potentially the Rift)? Because I feel that this kind of toy would work best with as direct a set of controls as possible. As evidenced by Google Tilt Brush and Quill, the 3D control setup allows for much finer control than you would usually achieve with a mouse and keyboard interface, especially in 3D space. For the experience to be as fluid as possible, it makes sense for the control and interactivity allowed by the VR systems.

 

Not Minecraft. Honest.

The first thing that anyone these days leaps to when someone suggests they are making anything involving Voxels is “yet another Minecraft ripoff.”

It’s understandable really. After the runaway success of the game – a success that eventually netted its creator a cool $2.5billion – everyone and their dog went to jump on the bandwagon. Additional successes of the original DayZ mod for ARMA2 pushed the survival aspect into the spotlight equally. Thus, steam is awash with creative survival titles hoping to hit those delicious big bucks that Minecraft pulled in (spoiler: they wont).

I’m not going this route.

 

What now?

There will be plenty of development blogs on this project as it moves forward, with the intent of releasing it on Steam when viable for a cheap price (thinking $5 or less).