The Rendering of Jurassic World: Evolution

Jurassic World: Evolution is the kind of game many kids (and adult-kids) dreamed of for a long time. What’s not to like about a game that gives you the reins of a park where the main attractions are 65-million-year-old colossal beasts? This isn’t the first successful amusement park game by Frontier Developments, but it’s certainly not your typical one. Frontier is a proud developer of their Cobra technology, which has been evolving since 1988. For JWE in particular it is a DX11 tiled forward renderer. For the analysis I used Renderdoc and turned on all the graphics bells and whistles. Welcome… to Jurassic Park.

The Frame

It’s hard to decide to present as a frame for this game, because free navigation and dynamic time of day means you have limitless possibilities, from a bird’s eye view to an extreme closeup of the dinosaurs, a sunset, a bright day or a hurricane. I chose a moody, rainy intermediate view that captures the dark essence of the original movies taking advantage of the Capture Mode introduced in version 1.7.

Compute Shaders

The first thing to notice about the frame is that it is very compute-heavy. In the absence of markers, Renderdoc splits rendering into passes if there are more than one Draw or Dispatch commands targeting the same output buffers. According to the capture there are 15 compute vs 18 color/depth passes, i.e. it is broadly split into half compute, half draw techniques. Compute can be more flexible than draw (and, if done correctly, faster) but a lot of time has to be spent fine-tuning and balancing performance. Frontier clearly spared no expense developing the technology to get there, however this also means that analyzing a frame is a bit harder.

Grass Displacement

A big component of JWE is foliage and its interaction with cars, dinosaurs, wind, etc. To animate the grass, one of the very first processes populates a top-down texture that contains grass displacement information. This grass displacement texture is later read in the vertex shader of all the grass in the game, and the information used to modify the position of the vertices of each blade of grass. The texture wraps around as the camera moves and fills in the new regions that appear at the edges. This means that the texture doesn’t necessarily look like a top-down snapshot of the scene, but will typically be split into 4 quadrants. The process involves these steps:

  1. Render dinosaurs and cars, probably other objects such as the gyrospheres. This doesn’t need an accurate version of the geometry, e.g. cars only render wheels and part of the chassis, which is in contact with grass. The result is a top down depth buffer (leftmost image). If you squint you’ll see the profile of an ankylosaurus. The other dinosaurs aren’t rendered here, perhaps the engine knows they aren’t stepping on grass and optimizes them out.
  2. Take this depth buffer and a heightmap of the scene (center image), and output three quantities: a mask to tell whether the depth of the object was above/below the terrain, the difference in depth between them, and the actual depth and pack them in a 3-channel texture (rightmost image)



An additional process simulates wind. In this particular scene there is a general breeze from the storm plus a helicopter, both producing currents that displace grass. This is a top down texture similar to the one before containing motion vectors in 2D. The motion for the wind is an undulating texture meant to mimic wind waves which seems to have been computed on the CPU, and the influence of the helicopter is cleverly done blending a stream of particles on top of the first texture. You can see it in the image as streams pulling outward. Dinosaur and car motion is also blended here. I’m not entirely sure what the purpose of the repeating texture is (you can see the same objects repeated multiple times).

Tiled Forward Lighting

One thing to explain before we get to the geometry prepass is tiled forward lighting. We talked about tiled in Rise of the Tomb Raider but there’s some differences. Tiled lighting in JWE splits the screen into 8×8 pixel tiles extruded towards the far plane to create subfrustums. A compute shader is dispatched per tile, which reads a big buffer with data for all lights. Intersecting each light against the subfrustum of a tile gives you a list of lights for that tile. In the lighting pass, each pixel reads the tile it belongs to and processes each light. Reducing the number of lights per tile is very important in a forward engine, where calculations happen as geometry is rendered and decisions per object would be too coarse and greatly impact performance. It’s worth mentioning that tiled lighting is particularly effective when there are many small lights, rather than a few big ones. JWE is full of small lights which makes it a suitable technique.

An additional consideration is that tiles in screen space extend from the camera to the far plane. We know lights may be occluded by opaque objects, so JWE uses the information from the depth buffer to reduce the range to the regions it cares about. To that end it creates a min-max depth buffer that we’ll explain later. A lot has been written about this so I’ll throw a couple of extra links here and here.

Depth Prepass

As is customary in many games, a depth prepass is performed on the main geometry. All geometry is submitted at this point in the frame, in contrast to other engines where simpler geometry (or none at all) can be submitted. It outputs some stencil values too, for many things in fact: to mark dinosaurs, foliage, buildings, terrain, every type of object seems to have its own id. As it effectively submits geometry more than once for processing, there must be good reasons for it.

previous arrow
next arrow
Slider

 

  1. Tiled rendering splits the screen into small equally-sized tiles and assigns lights to each tile. As well as splitting in screen space, tiles have a depth range (to avoid processing lights), and to obtain this range the depth buffer is used. Minimum and maximum values per tile are computed to determine the range
  2. Even though the previous point says that full depth information is necessary, in a deferred lighting engine that wouldn’t strictly be true. Tile classification could still happen after the GBuffer has been rendered, where all the depth content would be available. However, forward lighting happens when we render the object in the main lighting pass, and by that point we need to have done tile classification already
  3. A depth prepass helps with a GPU optimization called Early Z. For a forward engine this pass is actually quite important as the pixel cost of the main lighting pass is very high. Early Z helps the GPU avoid overdraw during the lighting pass by using the information from the pre-populated depth buffer to discard pixels behind other surfaces

One way to compute the minimum and maximum for a tile is what is called a Min-Max Depth Buffer. This is essentially two buffers, one containing the closest depth for a region of the screen, the other containing the maximum. A straightforward way of doing this is to compute mips in succession, i.e. Mip 0 -> Mip 1, then Mip 1 -> Mip 2, etc. It’s very interesting to see what this process does to the birdcage: the closest depth (top) makes the birdcage look solid, whereas the furthest depth (bottom) makes the birdcage disappear completely! Intuitively this makes sense, and you can now see what each tile represents. Note that JWE uses reverse depth, where the closest depth is represented by a 1, and the furthest by a 0. This distributes depth values better and helps avoid z-fighting at far distances.

previous arrow
next arrow
Slider

Thin GBuffer

The game renders the geometry again, after the depth prepass, outputting pixel normals + roughness in one texture and motion vectors in another. The main reason to have to do this is so that we can compute effects like SSAO, SSR, Temporal AA and other postprocesses that need these quantities. It’s a tradeoff that forward rendering engines have to do to be able to offload some of the work outside the main shader and compute fullscreen effects. Rendering the geometry again is a high price to pay but necessary, an alternative would be to output these quantities in the depth prepass.

previous arrow
next arrow
Slider

 

Atmospheric Simulation and Fog

Transmittance Sunny

JWE has a realistic atmospheric simulation to go with its weather system that runs in several steps. One of the first is to compute a transmittance look up table (LUT). Transmittance is a way of expressing how dense the atmosphere is, and can be modeled as a function of height from the surface of the earth, and elevation angle, hence a 2D texture.

Transmittance Rainy

There are some nice God Rays/light shafts caused by mountains, and the solution is actually tailored to them. A pass renders only the mountain meshes to a custom depth buffer. This depth buffer is split into 4 sections, and each frame only one of those sections is rendered, distributing work across frames. It is processed and produces a single orange looking texture. This texture stores the first two moments of a shadow map, essentially the depth value and the depth squared. These two quantities are blurred, giving it that fuzzy look. This is very typical of a family of techniques that precompute blurred shadows, such as variance mapping and moment shadow mapping. It’s an interesting approach because the rest of the shadows do not use it. Typically you’ll go with pre-blurring shadows if a raymarching pass is going to happen as it adds stability.

Shadow Moments

This is used in conjunction with the transmittance texture to compute fog both for reflection/diffuse cubemaps and for the entire scene at low resolution, upscaled later to fullscreen. Low resolution saves bandwidth and is suitable because fog is generally low frequency. Note the screen space fog texture has content from previous frames. This is an optimization if you can get away with it because you don’t need to clear the texture, clawing back precious fractions of millisecond. The texture is RGBA16F and includes inscattering (the “fog color”) in rgb and outscattering (the “fog amount”) in the alpha channel, and is read in the main lighting pass to composite with opaque geometry.

previous arrow
next arrow
Slider

 

Reflections

There are three systems to produce reflections in JWE: cubemaps, screen-space reflections and planar reflections. We’re going to go through them in sequence and see how they all fit together.

Cubemap Generation

One of the first color passes to happen in the frame is a cubemap face rendering. In my capture, one face of two different cubemaps is rendered (both of them include depth + lighting). Interestingly, the sky is colored ff00ff pink in some of the faces, as it turns out cubemaps are blended later based on this mask and only one final cubemap is ever used. Because rendering of these cubemaps is expensive, the cost is amortized across frames and each frame only a subset of the full cubemap is produced. The cubemaps also aren’t very detailed: they include the sky (produced by a compute shader we’ll mention in a moment), low poly landscape/foliage and lights.

previous arrow
next arrow
Slider

 

BRDF Texture

Cubemaps are used for two purposes: one is to generate reflection captures (for specular lighting), the other is to generate ambient lighting (a process called irradiance cubemaps, for diffuse lighting). The image sequence shows the blending of two different cubemaps which is not correct but illustrates the purpose (keep in mind one of them is still being processed!). The last sequence is irradiance generation which is used to integrate the lighting coming from multiple directions. A different process also downsamples the cubemaps, this time to approximate a rough specular response, using the split sum approximation described by Brian Karis in Real Shading in Unreal Engine 4.

Screen-Space Reflections

One of the most popular ways of doing reflections these days, SSR is present in many games. The particular approach JWE has taken I have seen used for games like Killzone Shadow Fall and Assassin’s Creed Black Flag, but people have used this technique in different ways. An initial compute pass takes in the pixel normals and depth buffer, as well as a small randomization texture, and outputs the UV of the reflection, reflection depth and a confidence value to a half resolution RGB10A2 texture. The confidence value is binary, and measures how sure we are that we have found a valid reflection. Invalid reflections can happen for a multitude of reasons, either because the ray went off screen, reflections are blocked by an object, or even if we’ve decided some surfaces aren’t worthy of reflections.

previous arrow
next arrow
Slider

 

One valid question to ask is: what do we use as the source of reflections? In a deferred engine, there is some hope to compute many parts of the lighting equation before SSR happens, but in a forward engine that would be very hard. The answer here is that we sample last frame’s lighting buffer. It has all the lighting, including transparent objects, fog, etc. The main caveat is that when the camera moves, the current pixel shouldn’t sample last frame’s buffer directly, but needs to go through a process called reprojection. We know where the camera was last frame so we can modify the UV we use to sample it. For moving objects, one would normally use motion vectors as well to do the reprojection, but JWE doesn’t do this. Once we have all the information, there is a dilation/blurring process for the highest mip available, and a mipchain generation to simulate roughness.

previous arrow
next arrow
Slider

 

Planar Reflections

There is a lot of water in JWE (dinosaurs need somewhere to drink and swim). It turns out water uses a pretty classic, but effective, technique: planar reflections. The first step renders a full resolution depth pre-pass where objects that need to be reflected are rendered “upside down” from the point of view of an imaginary camera that is below the reflection plane looking upwards. The objects rendered here are lower quality: for example, trees are plain alpha tested quads that read from a tree atlas, and dinosaurs are low poly. In the same spirit as tiled lighting, a step computes the min and max depth buffer, for the tiled light classification. The first object is a fullscreen quad for the sky, which takes a 128×128 cubemap as input with the sky, sun and low resolution clouds. It takes advantage of the prepass to only render where there are no opaque objects occluding it. The lighting pass happens in a similar way as the main pass.

previous arrow
next arrow
Slider

 

Shadow Mapping

JWE takes the fairly standard cascaded shadow mapping approach for its main directional light, of which there’s only ever one. There’s plenty of point lights around for the security fences and spotlights for the car headlights, but none of them are shadowed. It would probably be quite taxing to render that many shadowed lights, especially considering how geometry-heavy this game is. The cascades for the directional light are contained in a 2D texture array that contains 4 slices, and use reverse depth for rendering (1 is near plane, 0 is far plane). The capture I’m presenting is not very exciting as the directional light is fully occluded, so here’s what it looks like in another sunny capture.

As you can see, there are a lot of black regions where no geometry has been rendered. I think these regions are overlaps between cascades where it would make no sense to render, plus some form of exclusion outside the frustum.

Shadow Mask

This shadow map isn’t used directly during the lighting pass. A fullscreen pass that produces a shadow mask is used instead. This helps reduces the computational load on an already packed main lighting pass. The shadow mask is produced using a compute shader that takes in the depth buffer and the shadow cascades, to produce an image similar to this. Again this shadow mask is not the same as our original capture as the rainy scene has no direct shadows.

Ambient Occlusion

JWE takes the same approach as Rise of the Tomb Raider for its highest quality ambient occlusion, HBAO, so I will only touch on it briefly here. The screen is divided into 16 textures, and each computes ambient occlusion using a fixed set of random directions. The texture is then combined back and blurred. Many games use this technique as NVIDIA popularized it and even provide source code and ease of integration for its licensees. At lower qualities the ambient occlusion uses a slightly different multipass technique that struck me as similar to one developed by Intel although I cannot say for certain. The ambient occlusion is packed alongside the shadow mask so that the main shader samples the texture once.

Main Pass: Opaque + Transparent

Opaque objects are rendered first, where the early Z technique we mentioned earlier kicks in. The rasterizer state can actually be set to Equal, i.e. pixels whose depth is equal to the one already present in the depth buffer are rendered, otherwise discarded. This cuts pixel shader cost to only the visible pixels.

previous arrow
next arrow
Slider

 

Bug alert! Car headlights are supposed to have a neat volumetric effect applied to them. However, for some weird reason, they are rendered before all the opaque objects in the frame so they end up completely occluded! Look carefully at the carousel above to see this in action. This highlights the importance of proper ordering in the frame. The transparent pass comes right after that and conceptually is not very different, other than the blend modes change to mimic glass and they are sorted back to front. I hacked a quick composite to show what the car headlights should have looked like.

Forward rendering can be powerful in terms of shader flexibility: since everything happens inside the main pass, every attribute of the material is present, and lots of possibilities in terms of lighting models (hair, cloth) and material variety come more naturally than in a deferred pipeline. This flexibility comes with a few performance implications: expensive geometry passes and feature coupling: everything from material evaluation to lighting, reflections, global illumination, sampling fullscreen textures like ambient occlusion, SSR, shadows, etc. is packed in a single, enormous, shader.

In terms of performance, a relatively straightforward lighting pass shader in JWE has around ~600 instructions, while for comparison a fairly average GBuffer shader in Ryse: Son of Rome is ~70 instructions. Very long shaders tend to suffer from a GPU phenomenon called low occupancy, where the GPU is blocked due to resource contention. In general, low occupancy is not a good thing to have, although reality is always more complex. You can read more here. To alleviate this, forward engines tend to pull features out of the main pass. As  we saw, to implement SSAO and SSR, normals and roughness were output in an extra pass, duplicating vertex and pixel work.

Rain

Rain is an important part of the weather system, where a series of processes work together to give a convincing illusion.

Rain Ripples

A compute shader calculates rain textures later applied as normal maps e.g. on the landscape. The source for the rain is a couple of ripple textures containing the height of the rain waves created by the impact of a rain drop. From a height map it is straightforward to derive a normal map, which then gets blended with that of the target. The texture is tileable so it can be repeated many times across a given surface. The reason they come in pairs is because they are consecutive frames in an animation loop and they are being blended for a smoother animation. A series of compute shaders produces the mipmap chain for those textures.

previous arrow
next arrow
Slider
Rain Drops and Splashes

There are several rain layers that are all rendered as camera facing rectangles with rain textures in them that simulate falling rain. A rain generation compute pass drives the number of rain drops depending on how intense the rainfall is.

previous arrow
next arrow
Slider

 

I call rain puffs the very light cloud of rain you get when rain is decomposed into multiple droplets and almost floats after impacting. The texture for these looks like fireworks, it’s an interesting take on it. I also call rain sheet a very large polygon that has a multitude of rain drops in it simply scrolling downwards.

Rain Puff

Rain Drops

Rain Sheet

The source textures for each of these stages are relatively straightforward. However there is an interesting twist on the rain splash texture, where it uses a clever lighting technique to give it more volume. It is essentially a 3 channel texture where each channel gives some information on what the drop looks like lit from the left, from the middle, and from the right. Remember we’re rendering a flat quad! During rendering, we know where the light source is with respect to the normal vector of the quad, and based on this we can blend the channels appropriately, giving more weight to the channel that more accurately matches the light direction. See below to see it in action. All we’re doing is blending 3 images, and already it looks like a light is moving left and right.

previous arrow
next arrow
Slider

 

Splashes are also animated when they impact, its animation sourced from a big spritesheet that contains variations for 8 rain splash types and is composed of 8 frames, an example of which is shown below.

Temporal AA

JWE uses Temporal AA as its antialiasing technique. Aliasing produced by foliage, small geometry, thin cables, and small lights is very hard for the alternatives to reduce in the general case. TAA belongs to a family of techniques called temporal supersampling. These distribute computations across multiple frames and bring those results together to improve quality. For TAA this means the camera is constantly jittering, producing a slightly different image every frame and revealing detail that isn’t present in others. Accumulating these results gives a better image.

previous arrow
next arrow
Slider

 

There are a few edge cases we need to consider:

  • If the camera moves, we blend two unrelated images introducing ghosting. We can fix this using motion vectors, output during the thin gbuffer pass. They are an offset in screen UV space that tells us where in the previous frame a pixel was
  • If objects are occluding something one frame but not the next. This is called disocclusion
  • If an object’s previous position is outside the screen
  • Jittering can flicker in static images, caused by very thin geometry or high frequency normal maps

All these questions are answered differently in different engines. Despite all its imperfections, TAA creates a much better resulting image than the alternatives. The before and after results speak for themselves.

previous arrow
next arrow
Slider

Bloom, Lens Dirt and Tonemapping

Bloom uses the very classical but no less mesmerizing technique of downscale + upscale, the original implementation of which is often credited to Masaki Kawase in 2003 and henceforth called Kawase Blur. The way it works is it creates an image by thresholding the main lighting buffer (e.g. take all values above 1) and successively blurs + downscales that texture until it gets to an unrecognizable blob. From there it starts upscaling each mip while blending with the previous one. It does this all the way up to Mip 1 and finally composites it with a texture containing camera imperfections and applies tonemapping.

previous arrow
next arrow
Slider

 

We’ve seen color cubes in the past in Shadow of Mordor, in this case tonemapping is packed in a cube instead of executed via a formula, which gives it a peculiar saturated orange/green color. There’s another color correction pass right after, but the concept is similar. With this, the frame essentially has ended.

Blur and UI

The main screenshot has no UI as I hid it for the beauty shot, but there is one during gameplay and it’s actually quite nice. One of the last steps produces a blurred downscaled copy of the lighting buffer later used in the UI to give it a frosted, translucent look.

previous arrow
next arrow
Slider

Top-down Color Map

A very interesting process happens at the end of the frame. A top-down “color” map is produced from the contents of the lighting and depth buffers. The breakdown is:

  1. Dispatch 240×136 threads. Each thread expands to one of every 8 pixels in the depth buffer
  2. Read depth, and reconstruct the world position of the pixel
  3. Read the lighting buffer, compute the luminance and manipulate the color based on some formulas
  4. Read stencil and based on some criteria, output the values. The criteria seems to be if the object is static, and the depth is not too far away or something similar
  5. The output color is current color plus the difference between the new and old, and the last channel is luminance

The essence of this algorithm, as far as I can tell, is to build a color map based on what you are seeing at the moment! As you move around the world, the top-down content slowly gets populated. I’m not sure where this is used (maybe next frame? maybe bounced lighting?) as it’s done at the very end and not used again. I guess code, uh, finds a way.

With this we end our analysis, we’ve covered a lot and I hope you enjoyed it and perhaps even learned something. I’d like to thank the team at Frontier for their excellent games, and as always mention Adrian Courrèges, who first inspired me to write these with his own, as well as hosts a neat page where he keeps all the different studies from other people.

Rendering Line Lights

Within the arsenal of lights provided by game engines, the most popular are punctual lights such as point, spot or directional because they are cheap. On the other end, area lights have recently produced incredible techniques such as Linearly Transformed Cosines and other analytic approximations. I want to talk about the line light.

Update [04/09/2020] When I originally wrote the article there were no public images showing Jedi or lightsabers so I couldn’t make the connection (though a clever reader could have concluded what they might be for!) I can finally show this work off as it’s meant to be.

In Unreal Engine 4, modifying ‘Source Length’ on a point light elongates it as described in this paper. It spreads the intensity along the length so a longer light becomes perceptually dimmer. Frostbite also has tube lights, a complex implementation of the analytical illuminance emitted by a cylinder and two spheres. Unity includes tube lights as well in their HD Render Pipeline (thanks Eric Heitz and Evegenii Golubev for pointing it out) based on their LTC theory, which you can find a great explanation and demos for here. Guerrilla Games’ Decima Engine has elongated quad lights using an approach for which they have a very attractive and thorough explanation in GPU Pro 5’s chapter II.1, Physically Based Area Lights. This is what I adapted to line lights.

Most Representative Point

The method is inspired by Montecarlo Importance Sampling, where a biasing function modifies uniform input samples into samples that are non-uniformly distributed according to the shape of the function. The typical scenario in rendering is to efficiently sample a specular BRDF, where uniform samples produce suboptimal results at low roughnesses. MRP takes the idea to the extreme, using a single most important sample. Past literature explores this idea in detail here and here. The core of the algorithm is to find a point that provides the greatest contribution and treat that as a point light, leveraging existing BRDF and falloff functions. I imagine a light that “travels” with the shaded pixel bound by some rules, the result looking like a light with some dimensionality. All the above engines use this idea in varying forms. We’ll describe the line light as a segment formed by points A and B, and globally define P as the shading point.

Diffuse

A key insight for me was discovering that there are actually two most representative points: diffuse and specular. Each point does its part and after evaluating the BRDF we add their contributions together. According to Guerrilla’s paper, the most representative point for a diffuse BRDF is the intersection point between two vectors:

  1. The half vector formed by the vector from P to A and the vector from P to B
  2. The vector defined by the line direction AB

Here I have shown three shading points P1-3, to illustrate how the position of the virtual point light L1-3 moves with the shading point. The moment L reaches A or B, it won’t be able to travel further and stop at that point, which we’ll perceive as a segment.

There are two main approaches to compute L, intersection and geometric. I will briefly mention both as it was my original thought process. For the intersection approach we first compute H, the half vector between PA and PB. We then find the intersection point between vector AB and H. The derivation and proof for a robust algorithm to do this is shown in Real Time Rendering, Third Edition, p.782, or a small excerpt formula here. In code:

This works and is robust but expensive, so we resort to our knowledge of geometry. The half vector is what is called the bisector of an angle, i.e. it cuts the angle exactly in half. The angle bisector theorem says that there is a proportion between the lengths a and b of the segments that form the angle and the lengths x and y of the two segments that the intersection produces, more specifically

{ a / b = x / y }

Calculating the length of x would allow us to simply offset point A using vector AB to get to the desired point, which is much more efficient. In code:

Specular

The most significant specular contribution from a light is going to be around the reflection vector. If we can find the point on the line that is closest to the vector we can use that as our specular point light. In the following diagram we can see the reflection vector R from the camera along the normal N. From that ray we can calculate the closest point on segment AB to R.

For this calculation I followed the derivation here which is explained in a lot of detail. My solution has assumed that the reflection vector is normalized and therefore the dot product with itself is 1.

Horizon Handling

Up to now the algorithm works pretty well but breaks down when the segment intersects the plane defined by the shaded pixel and its normal vector, because it can end up selecting a point behind the plane that doesn’t represent the light. The solution is to determine the segment-plane intersection point, and limit the points A or B (depending on the case) to that point. We effectively only consider the segment that is on the positive side of the plane, and do our calculations as described before.

Light Textures

Typically, lights can have projected textures that tint the light. In line lights we might want to use a cylindrical texture surrounding the light so the correct vector to sample such a texture is neither the one used for diffuse nor the one used for specular, but rather a vector perpendicular to the line that passes through the shaded point. Essentially, we need to calculate the closest point from the shaded pixel to the line light, and use that to sample the texture.

An alternative is to use a cubemap and treat it as if it was a point light. Simply calculate the center of the segment (before horizon handling) and get vector from shading point to center, and then use that as the sampling vector for the cubemap.

Tube Light Extension

If you wanted to turn this into an actual tube light, a simple approach is to intersect the light with a cylinder of radius R. If we already have the closest point to the line (or we can calculate it in the same way as the closest point to the segment) we can compute, using similar triangles, a distance along the vector we use for diffuse or specular, to obtain a new point on the surface of the light. To account for points on the inside of the light, we must clamp the distance to the length between the shading point and the point on the light, or we risk selecting a point behind the surface. What this means is that for all points within the surface of the tube we’ll get the maximum intensity.

Shadow Mapping

For shadow mapping there is a simple option which is to treat it like a point light, and make shadows emanate from the center, using a cubemap or dual paraboloid which are popular shadowing methods. This is what Unreal Engine does.

The other option is to create a custom shadow projection for the light, which would probably be cylindrical with some special treatment for the caps. Neither MRP is useful for the sampling so the closest vector to the shading point would probably be the most adequate. I have not implemented this so this part is theoretical only.

Shadertoy

If some of the above did not make sense to you, open the shadertoy implementation and hack away. You’ll probably learn a lot that way too! I’ve not implemented the tube light extension or the shadow mapping in the shadertoy.

We’re Hiring!

One last thing I’d like to mention is that we’re always doing cool stuff at Tt Games. Be it rendering, simulation, networking or tools there are always open positions for talented people who want to make awesome Lego games. Tune in at http://ttgames.com/careers/ to see what fits you!

The Rendering of Rise of the Tomb Raider

Rise of the Tomb Raider (2015) is the sequel to the excellent Tomb Raider (2013) reboot. I personally find both refreshing as they move away from the stagnating original series and retell the Croft story. The game is story focused and, like its prequel, offers enjoyable crafting, hunting and climbing/exploring mechanics.

Tomb Raider used the Crystal Engine, developed by Crystal Dynamics also used in Deus Ex: Human Revolution. For the sequel a new engine called Foundation was used, previously developed for Lara Croft and the Temple of Osiris (2014). Its rendering can be broadly classified as a tiled light-prepass engine, and we’ll see what that means as we dive in. The engine offers the choice between a DX11 and DX12 renderer; I chose the latter for reasons we’ll see later. I used Renderdoc 1.2 to capture the frame, on a Geforce 980 Ti, and turned on all the bells and whistles.

The Frame

I can safely say without spoilers that in this frame bad guys chase Lara because she’s looking for an artifact they’re looking for too, a conflict of interest that absolutely must be resolved using weapons. Lara is inside the enemy base at nighttime. I chose a frame with atmospheric and contrasty lighting where the engine can show off.

Depth Prepass

A customary optimization in many games, a small depth prepass takes place here (~100 draw calls). The game renders the biggest objects (rather the ones that take up the most screen space), to take advantage of the Early-Z capability of GPUs. A concise article by Intel explains further. In short, the GPU can avoid running a pixel shader if it can determine it’s occluded behind a previous pixel. It’s a relatively cheap pass that will pre-populate the Z-buffer with depth.

An interesting thing I found is a level of detail (LOD) technique called ‘fizzle’ or ‘checkerboard’. It’s a common way to fade objects in and out at a distance, either to later replace it with a lower quality mesh or to completely make it disappear. Take a look at this truck. It seems to be rendering twice, but in reality it’s rendering a high LOD and a low LOD at the same position, each rendering to the pixels the other is not rendering to. The first LOD is 182226 vertices, whereas the second LOD is 47250. They’re visually indistinguishable at a distance, and yet one is 3 times cheaper. In this frame, LOD 0 has almost disappeared while LOD 1 is almost fully rendered. Once LOD 0 completely disappears, only LOD 1 will render.

A pseudorandom texture and a probability factor allow us to discard pixels that don’t pass a threshold. You can see this texture used in ROTR. You might be asking yourself why not use alpha blending. There are many disadvantages to alpha blending over fizzle fading.

  1. Depth prepass-friendly: By rendering it like an opaque object and puncturing holes, we can still render into the prepass and take advantage of early-z. Alpha blended objects don’t render into the depth buffer this early due to sorting issues.
  2. Needs extra shader(s): If you have a deferred renderer, your opaque shader doesn’t do any lighting. You need a separate variant that does if you’re going to swap an opaque object for a transparent one. Aside from the memory/complexity cost of having at least an extra shader for all opaque objects, they need to be accurate to avoid popping. There are many reasons why this is hard, but it boils down to the fact they’re now rendering through a different code path.
  3. More overdraw: Alpha blending can produce more overdraw and depending on the complexity of your objects you might find yourself paying a large bandwidth cost for LOD fading.
  4. Z-fighting: z-fighting is the flickering effect when two polygons render to a very similar depth such that floating point imprecision causes them to “take turns” to render. If we render two consecutive LODs by fading one out and the next one in, they might z-fight since they’re so close together. There are ways around it like biasing one over the other but it gets tricky.
  5. Z-buffer effects: Many effects like SSAO rely on the depth buffer. If we render transparent objects at the end of the pipeline when ambient occlusion has run already, we won’t be able to factor them in.

One disadvantage of this technique is that it can look worse than alpha fading, but a good noise pattern, post-fizzle blurring or temporal AA can hide it to a large extent. ROTR doesn’t do anything fancy in this respect.

Normals Pass

Crystal Dynamics uses a relatively unusual lighting scheme for its games that we’ll describe in the lighting pass. For now suffice it to say that there is no G-Buffer pass, at least not in the sense that other games have us accustomed to. Instead, the objects in this pass only output depth and normals information. Normals are written to an RGBA16_SNORM render target in world space. As a curiosity, this engine uses Z-up as opposed to Y-up which is what I see more often in other engines/modelling packages. The alpha channel contains glossiness, which will be decompressed later as exp2(glossiness * 12 + 1.0). The glossiness value can actually be negative, as the sign is used as a flag to indicate whether a surface is metallic or not. You can almost spot it yourself, as the darker colors in the alpha channel are all metallic objects.

RGBA
Normal.xNormal.yNormal.zGlossiness + Metalness

previous arrow
next arrow
Slider

 
Continue reading

A real life pinhole camera

When I got married last year, me and my wife went on our honeymoon to Thailand. Their king Bhumibol had died just a month ago and the whole country was mourning, so everywhere we found memorials and good wishes for their king, and people would dress in black and white as a sign of sorrow. The Thai are a gentle and polite people, who like to help out; we’d ask for directions and people with no notions of English would spend twenty minutes trying to understand and answer our questions. Thailand has a rich history of rising and falling kingdoms, great kings and battles, and unification and invasions by foreign kingdoms. There are some amazing ruins of these kingdoms. Thailand also lives by a variant of Buddhism reflected in all of their beautiful temples. Some of the architectural features I found most interesting are the small reflective tiles that cover the outer walls, animal motives like the Garuda, (bird creatures that can be seen on the rooftops) and snake-like creatures called Naga It is in this unexpected context that I found a real-life pinhole camera. I always wear my graphics hat so I decided to capture it and later make a post.

First, a little background. A pinhole camera (also known as camera obscura after its latin name) is essentially the simplest camera you can come up with. If you conceptually imagine a closed box that has a single, minuscule hole in one of its faces, such that a single ray from each direction can come inside, you’d have a mirrored image at the inner face of the other side of the box to where the pinhole is. An image is worth more than a thousand explanations, so here’s what I’m talking about.

 

previous arrow
next arrow
Slider

 

As you can see, the concept is simple. If you were inside the room, you’d see an inverted image of the outside. The hole is so small the room would be fairly dark so even the faint light now bouncing back towards you would still be visible. I made the pinhole a hexagon, as I wanted to suggest the fact that it is effectively the shutter of a modern camera. Louis Daguerre, one of the fathers of photography, used this model in his famous daguerreotype circa 1835, but Leonardo da Vinci had already described this phenomenon as an oculus artificialis (artificial eye) in one of his works in as early as 1502. There are plenty additional resources if you’re interested and even a pretty cool tutorial on how to create your own.

Now that we understand what this camera is, let’s look at the real image I encountered. I’ve aligned the inside and outside images I took and cast rays so you can see what I mean.

 

previous arrow
next arrow
Slider

 

The image of the inside looks bright but I had to take it with 1 second of exposure and it still looks relatively dark. On top of that the day outside was very sunny which helped a lot in getting a clear “photograph”.

The Rendering of Middle Earth: Shadow of Mordor

Middle Earth: Shadow of Mordor was released in 2014. The game itself was a great surprise, and the fact that it was a spin-off within the storyline of the Lord of the Rings universe was quite unusual and it’s something I enjoyed. The game was a great success, and at the time of writing, Monolith has already released the sequel, Shadow of War. The game’s graphics are beautiful, especially considering it was a cross-generation game and was also released on Xbox 360 and PS3. The PC version is quite polished and features a few extra graphical options and hi-resolution texture packs that make it shine.

The game uses a relatively modern deferred DX11 renderer. I used Renderdoc to delve into the game’s rendering techniques. I used the highest possible graphical settings (ultra) and enabled all the bells and whistles like order-independent transparency, tessellation, screen-space occlusion and the different motion blurs.

The Frame

This is the frame we’ll be analyzing. We’re at the top of a wooden scaffolding in the Udun region. Shadow of Mordor has similar mechanics to games like Assassin’s Creed where you can climb buildings and towers and enjoy some beautiful digital scenery from them.

Depth Prepass

The first ~140 draw calls perform a quick prepass to render the biggest elements of the terrain and buildings into the depth buffer. Most things don’t end up appearing in this prepass, but it helps when you’ve got a very big number of draw calls and a far range of view. Interestingly the character, who is always in front and takes a decent amount of screen space, does not go into the prepass. As is common for many open world games, the game employs reverse z, a technique that maps the near plane to 1.0 and far plane to 0.0 for increased precision at great distances and to prevent z-fighting. You can read more about z-buffer precision here.

 

G-buffer

Right after that, the G-Buffer pass begins, with around ~2700 draw calls. If you’ve read my previous analysis for Castlevania: Lords of Shadow 2 or have read other similar articles, you’ll be familiar with this pass. Surface properties are written to a set of buffers that are read later on by lighting passes to compute its response to the light. Shadow of Mordor uses a classical deferred renderer, but uses a comparably small amount of G-buffer render targets (3) to achieve its objective. Just for comparison, Unreal Engine uses between 5 and 6 buffers in this pass. The G-buffer layout is as follows:

Normals Buffer
RGBA
Normal.xNormal.yNormal.zID

The normals buffer stores the normals in world space, in 8-bit per channel format. This is a little bit tight, sometimes not enough to accurately represent smoothly varying flat surfaces, as can be seen in some puddles throughout the game if paying close attention. The alpha channel is used as an ID that marks different types of objects. Some that I’ve found correspond to a character (255), an animated plant or flag (128), and the sky is marked with ID 1, as it’s later used to filter it out during the bloom phase (it gets its own radial bloom).

previous arrow
next arrow
Slider

Continue reading

Photoshop Blend Modes Without Backbuffer Copy

For the past couple of weeks, I have been trying to replicate the Photoshop blend modes in Unity. It is no easy task; despite the advances of modern graphics hardware, the blend unit still resists being programmable and will probably remain fixed for some time. Some OpenGL ES extensions implement this functionality, but most hardware and APIs don’t. So what options do we have?

1) Backbuffer copy

A common approach is to copy the entire backbuffer before doing the blending. This is what Unity does. After that it’s trivial to implement any blending you want in shader code. The obvious problem with this approach is that you need to do a full backbuffer copy before you do the blending operation. There are certainly some possible optimizations like only copying what you need to a smaller texture of some sort, but it gets complicated once you have many objects using blend modes. You can also do just a single backbuffer copy and re-use it, but then you can’t stack different blended objects on top of each other. In Unity, this is done via a GrabPass. It is the approach used by the Blend Modes plugin.

2) Leveraging the Blend Unit

Modern GPUs have a little unit at the end of the graphics pipeline called the Output Merger. It’s the hardware responsible for getting the output of a pixel shader and blending it with the backbuffer. It’s not programmable, as to do so has quite a lot of complications (you can read about it here) so current GPUs don’t have one.

The blend mode formulas were obtained here and here. Use it as reference to compare it with what I provide. There are many other sources. One thing I’ve noticed is that provided formulas often neglect to mention that Photoshop actually uses modified formulas and clamps quantities in a different manner, especially when dealing with alpha. Gimp does the same. This is my experience recreating the Photoshop blend modes exclusively using a combination of blend unit and shaders. The first few blend modes are simple, but as we progress we’ll have to resort to more and more tricks to get what we want.

Two caveats before we start. First off, Photoshop blend modes do their blending in sRGB space, which means if you do them in linear space they will look wrong. Generally this isn’t a problem, but due to the amount of trickery we’ll be doing for these blend modes, many of the values need to go beyond the 0 – 1 range, which means we need an HDR buffer to do the calculations. Unity can do this by setting the camera to be HDR in the camera settings, and also setting Gamma for the color space in the Player Settings. This is clearly undesirable if you do your lighting calculations in linear space. In a custom engine you would probably be able to set this up in a different manner (to allow for linear lighting).

If you want to try the code out while you read ahead, download it here.

A) Darken

Formulamin(SrcColor, DstColor)
Shader Output
Blend UnitMin(SrcColor · One, DstColor · One)

darken

As alpha approaches 0, we need to tend the minimum value to DstColor, by forcing SrcColor to be the maximum possible color float3(1, 1, 1)

B) Multiply

FormulaSrcColor · DstColor
Shader Output
Blend UnitSrcColor · DstColor + DstColor · OneMinusSrcAlpha

multiply

Continue reading

The Rendering of Castlevania: Lords of Shadow 2

Castlevania Lords of Shadow 2 was released in 2014, a sequel that builds on top of Lords of Shadow, its first installment, which uses a similar engine. I hold these games dear and, being Spanish myself, I’m very proud of the work MercurySteam, a team from Madrid, did on all three modern reinterpretations of the Castlevania series (Lords of Shadow, Mirror of Fate and Lords of Shadow 2). Out of curiosity and pure fandom for the game I decided to peek into the Mercury Engine. Despite the first Lords of Shadow being, without shadow of a doubt (no pun intended), the best and most enjoyable of the new Castlevanias, out of justice for their hard work I decided to analyze a frame from their latest and most polished version of the engine. Despite being a recent game, it uses DX9 as graphics backend. Many popular tools like RenderDoc or the newest tools by Nvidia and AMD don’t support DX9, so I used Intel Graphics Analyzer to capture and analyze all the images and code from this post. While having a bit of graphics parlance, I’ve tried to include as many images as possible, with occasional code and in-depth explanations.

Analyzing a Frame

This is the frame we’re going to be looking at. It’s the beginning scene of Lords of Shadow 2, Dracula has just awakened, enemies are knocking at his door and he is not in the best mood.

CLOS2 Castle Final Frame

Depth Pre-pass

LoS2 appears to do what is called a depth pre-pass. What it means is you send the geometry once through the pipeline with very simple shaders, and pre-emptively populate the depth buffer. This is useful for the next pass (Gbuffer), as it attempts to avoid overdraw, so pixels with a depth value higher than the one already in the buffer (essentially, pixels that are behind) get discarded before they run the pixel shader, therefore minimizing pixel shader runs at the cost of extra geometry processing. Alpha tested geometry, like hair and a rug with holes, are also included in the pre-pass. LoS2 uses both the standard depth buffer and a depth-as-color buffer to be able to sample the depth buffer as a texture in a later stage.

The game also takes the opportunity to fill in the stencil buffer, an auxiliary buffer that is part of the depth buffer, and generally contains masks for pixel selection. I haven’t thoroughly investigated why precisely all these elements are marked, but for instance was presents higher subsurface scattering and hair and skin have its own shading, independent of the main lighting pass, which stencil allows to ignore.

  • Dracula: 85
  • Hair, skin and leather: 86
  • Window glass/blood/dripping wax: 133
  • Candles: 21

The first image below shows what the overdraw is like for this scene. A depth pre-pass helps if you have a lot of overdraw. The second image is the stencil buffer.

previous arrow
next arrow
Slider

GBuffer Pass

LoS2 uses a deferred pipeline, fully populating 4 G-Buffers. 4 buffers is quite big for a game that was released on Xbox360 and PS3, other games get away with 3 by using several optimizations.

Normals (in World Space):

normal.rnormal.gnormal.bsss

The normal buffer is populated with the three components of the world space normal and a subsurface scattering term for hair and wax (interestingly not skin). Opaque objects only transform their normal from tangent space to world space, but hair uses some form of normal shifting to give it anisotropic properties.

previous arrow
next arrow
Slider

Albedo:

albedo.ralbedo.galbedo.balpha * AOLevels

The albedo buffer stores all three albedo components plus an ambient occlusion term that is stored per vertex in the alpha channel of the vertex color and is modulated by an AO constant (which I presume depends on the general lighting of the scene).

previous arrow
next arrow
Slider

Specular:

specular.rspecular.gspecular.bFresnel multiplier

The specular buffer stores the specular color multiplied by a fresnel term that depends on the view and normal vectors. Although LoS2 does not use physically-based rendering, it includes a Fresnel term probably inspired in part by the Schlick approximation to try and brighten things up at glancing angles. It is not strictly correct, as it is done independently of the real-time lights. The Fresnel factor is also stored in the w component.

previous arrow
next arrow
Slider

Continue reading

Replay system using Unity

Unity is an incredible tool for making quality games at a blazing fast pace. However, like all closed systems there are some limitations to how you can extend the engine and one such limitation is developing a good replay system for a game. I will talk about two possible approaches and how to solve other issues along the way. The system was devised for a Match 3 prototype, but can be applied to any project. There are commercial solutions available, but this post is intended for coders.

If the game has been designed with a deterministic outcome in mind, the most essential parts of recording are input events, delta times (optional) and random seeds. What this means is the only input available to the game will be the players’ actions, the rest should be simulated properly to arrive at the same outcome. Since storing random seeds and loading as appropriate is more or less straightforward, we will focus on the input.

1) The first issue is how to capture and replay input in the least disturbing way possible. If you have used Unity’s Input class, Input.GetMouseButton(i) should look familiar. The replay system was added after developing the main mechanics, and we didn’t want to go back and rewrite how the game worked. Such a system should ideally work for future or existing games, and Unity already provides a nice interface that other programmers use. Furthermore, plugins use this interface, and not sticking to it can severely limit your ability to record games.

The solution we arrived at was shadowing Unity’s Input class by creating a new class with the same name and accessing it through the UnityEngine namespace inside of the new class. This allows for conditional routing of Unity’s input, therefore passing recorded values into the Input.GetMouseButtonX functions, and essentially ‘tricking’ the game into thinking it is playing real player input. You can do the same with keys.

There are many functions and properties to override, it can take time and care to get it all working properly. Once you have this new layer you can create a RecordManager class and start creating methods that connect with the new Input class.

2) The second issue is trickier to get properly working, due to common misconceptions (myself included) about how Unity’s Update loops work. Unity has two different Update loops that serve different purposes, Update and FixedUpdate. Update runs at every frame, whereas FixedUpdate updates at a fixed, specified time interval. FixedUpdate has absolutely nothing to do with Update. No rule says that for every Update there should be a FixedUpdate, or that there should be no more than one for every Update.

Let’s explain it with two use cases. For both, the FixedUpdate interval is 0,017 s (~60 fps).

a) Update runs at 60 fps (same as FixedUpdate). The order of updates would be:

b) Update runs faster (120 fps). I have chosen this number because it is exactly double that of FixedUpdate. In this case, the order of updates would be as follows:
There is one FixedUpdate every two Update.

c) Update runs slower (30 fps). Same rule as above, but 30 = 60/2

Since FixedUpdate can’t keep up with Update, it updates twice to compensate.

This brings up the following question: where should I record input events, and where should I replay them? How can I replay something I recorded on one computer on another, and have the same output?

The answer to the first question is record in Update. It is guaranteed to run in every Unity tick, and doing so in FixedUpdate will cause you to miss input events and mess up your recording. The answer to the second question is a little more open, and depends on how you recorded your data.

One approach is to record the deltaTime in Update for every Update, and shadow Unity’s Time class the same way we did with Input to be able to read a recorded Time.deltaTime property wherever it’s used. This has two possible issues, namely precision (of the deltaTime) and storage.

The second approach is to save events and link them to their corresponding FixedUpdate tick, that way you can associate many events to a single tick (if Update goes too fast) or none (if Update goes too slow). With this approach you can only execute your code in FixedUpdate, and execute as many times as recorded Updates there are. It’s also important to save the average Update time of the original recording and set it as the FixedUpdate interval. The simulation will not be 100% accurate in that Update times won’t fluctuate as they did in the original recording session, but it is guaranteed to execute the same code.

ScriptExecutionOrder

There is one last setting that’s needed to properly record events, which is set the RecordManager to record all input at the beginning of every frame. Unity has a Script Execution Order option under Project Settings where you can set the RecordManager to run before any other script. That way recording and replaying are guaranteed to run

Java and Vector Math

Some time ago, I had to develop a 3D vector utility class for Java. Because the Android platform only uses Java, this is a must if you’re developing for it (unless you’re directly using the NDK or some middleware such as Unity. I’ll get back to this later).

Java, like all programming languages, has its virtues and weaknesses. I found trying to develop a solid Vector3 class to be one such weakness, because Java lacks two main features that I consider core to what I was trying to achieve: stack-allocated objects and operator overloading. These missing features make operating with vectors in Java an annoyance beyond measure.

As much as some people seem to dislike operator overloading, vector/matrix math is one domain where I consider they excel, and the lack of it is going to force me to always go through functions for even the simplest of operations, such as adding/subtracting two vectors or multiplying/dividing by a scalar.

Compare the following lines of code:

Doesn’t seem too bad, does it? Let’s try something different, like obtaining a direction from two points, normalizing, scaling by a factor, and adding it to a point (a relatively frequent operation)

It’s either that or separating into several lines so it becomes clearer and a bit more readable. This is clearly an undesirable way of working with vectors, but the only at our disposal when using Java.

There’s another caveat, though, one that is implicit in the way we have used the equal operator until now. We have been assigning the Vector3 by reference all this time, invalidating the erroneous assumption that we get a new Vector3 out of the operation. What we want is a copy of the resulting Vector3, which means create a new Vector3 using the new operator, and copy the values into the new Vector3. Therefore, line [2] would become something along the lines of:

or

In any case it is a very confusing way of working with vectors.

There is yet another annoyance to be wary of when working with instantiations on the heap in a garbage collected language such as Java which is, precisely, the dreaded Garbage Collector. Vector3 operations typically go inside long loops where interesting calculations take place for many objects in the virtual world, and creating all those new objects in the heap is asking for trouble when the GC comes to inspect your pile of vector rubbish and tries to clean up the mess. This is due to the fact that there are no stack-allocated objects in Java, leaving us with one option – creating temporary Vector3’s and reusing them. This has its fair share of problems too – mainly readability, the ability to use several temporary vectors for intermediate calculations, and having to pass the Vector3 reference as a parameter instead of returning it as a normal function return value. Let’s go back to our example.

Definitely not ideal. In contrast, and for all its similarity with Java, C# has both these features, which makes it a language of choice for these kinds of applications, and I wonder if it was one among the multiple reasons why Unity chose it as their main development language.