## Rendering Line Lights

Within the arsenal of lights provided by a game engine, the most popular are punctual lights such as point, spot or directional because they are cheap. On the other end, area lights have recently produced incredible techniques such as Linearly Transformed Cosines and other analytic approximations. The type of light I want to talk about is the line light.

In Unreal Engine 4, modifying ‘Source Length’ on a point light elongates it as described in this paper. It spreads the intensity along the length so a longer light becomes perceptually dimmer. Frostbite also has tube lights, a complex implementation of the analytical illuminance emitted by a cylinder and two spheres. Unity includes tube lights as well in their HD Render Pipeline (thanks Eric Heitz and Evegenii Golubev for pointing it out) based on their LTC theory, which you can find a great explanation and demos for here. Guerrilla Games’ Decima Engine has elongated quad lights using an approach for which they have a very attractive and thorough explanation in GPU Pro 5’s chapter II.1, Physically Based Area Lights. This is what I adapted to line lights.

Most Representative Point

The method is inspired by Montecarlo Importance Sampling, where a biasing function modifies uniform input samples into samples that are non-uniformly distributed according to the shape of the function. The typical scenario in rendering is to efficiently sample a specular BRDF, where uniform samples produce suboptimal results at low roughnesses. MRP takes the idea to the extreme, using a single most important sample. Past literature explores this idea in detail here and here. The core of the algorithm is to find a point that provides the greatest contribution and treat that as a point light, leveraging existing BRDF and falloff functions. I imagine a light that “travels” with the shaded pixel bound by some rules, the result looking like a light with some dimensionality. All the above engines use this idea in varying forms. We’ll describe the line light as a segment formed by points A and B, and globally define P as the shading point.

Diffuse

A key insight for me was discovering that there are actually two most representative points: diffuse and specular. Each point does its part and after evaluating the BRDF we add their contributions together. According to Guerrilla’s paper, the most representative point for a diffuse BRDF is the intersection point between two vectors:

1. The half vector formed by the vector from P to A and the vector from P to B
2. The vector defined by the line direction AB

Here I have shown three shading points P1-3, to illustrate how the position of the virtual point light L1-3 moves with the shading point. The moment L reaches A or B, it won’t be able to travel further and stop at that point, which we’ll perceive as a segment.

There are two main approaches to compute L, intersection and geometric. I will briefly mention both as it was my original thought process. For the intersection approach we first compute H, the half vector between PA and PB. We then find the intersection point between vector AB and H. The derivation and proof for a robust algorithm to do this is shown in Real Time Rendering, Third Edition, p.782, or a small excerpt formula here. In code:

This works and is robust but expensive, so we resort to our knowledge of geometry. The half vector is what is called the bisector of an angle, i.e. it cuts the angle exactly in half. The angle bisector theorem says that there is a proportion between the lengths a and b of the segments that form the angle and the lengths x and y of the two segments that the intersection produces, more specifically

${ a / b = x / y }$

Calculating the length of x would allow us to simply offset point A using vector AB to get to the desired point, which is much more efficient. In code:

Specular

The most significant specular contribution from a light is going to be around the reflection vector. If we can find the point on the line that is closest to the vector we can use that as our specular point light. In the following diagram we can see the reflection vector R from the camera along the normal N. From that ray we can calculate the closest point on segment AB to R.

For this calculation I followed the derivation here which is explained in a lot of detail. My solution has assumed that the reflection vector is normalized and therefore the dot product with itself is 1.

Horizon Handling

Up to now the algorithm works pretty well but breaks down when the segment intersects the plane defined by the shaded pixel and its normal vector, because it can end up selecting a point behind the plane that doesn’t represent the light. The solution is to determine the segment-plane intersection point, and limit the points A or B (depending on the case) to that point. We effectively only consider the segment that is on the positive side of the plane, and do our calculations as described before.

Light Textures

Typically, lights can have projected textures that tint the light. In line lights we might want to use a cylindrical texture surrounding the light so the correct vector to sample such a texture is neither the one used for diffuse nor the one used for specular, but rather a vector perpendicular to the line that passes through the shaded point. Essentially, we need to calculate the closest point from the shaded pixel to the line light, and use that to sample the texture.

An alternative is to use a cubemap and treat it as if it was a point light. Simply calculate the center of the segment (before horizon handling) and get vector from shading point to center, and then use that as the sampling vector for the cubemap.

Tube Light Extension

If you wanted to turn this into an actual tube light, a simple approach is to intersect the light with a cylinder of radius R. If we already have the closest point to the line (or we can calculate it in the same way as the closest point to the segment) we can compute, using similar triangles, a distance along the vector we use for diffuse or specular, to obtain a new point on the surface of the light. To account for points on the inside of the light, we must clamp the distance to the length between the shading point and the point on the light, or we risk selecting a point behind the surface. What this means is that for all points within the surface of the tube we’ll get the maximum intensity.

For shadow mapping there is a simple option which is to treat it like a point light, and make shadows emanate from the center, using a cubemap or dual paraboloid which are popular shadowing methods. This is what Unreal Engine does.

The other option is to create a custom shadow projection for the light, which would probably be cylindrical with some special treatment for the caps. Neither MRP is useful for the sampling so the closest vector to the shading point would probably be the most adequate. I have not implemented this so this part is theoretical only.

If some of the above did not make sense to you, open the shadertoy implementation and hack away. You’ll probably learn a lot that way too! I’ve not implemented the tube light extension or the shadow mapping in the shadertoy.

We’re Hiring!

One last thing I’d like to mention is that we’re always doing cool stuff at Tt Games. Be it rendering, simulation, networking or tools there are always open positions for talented people who want to make awesome Lego games. Tune in at http://ttgames.com/jobs/ to see what fits you!

## The Rendering of Rise of the Tomb Raider

Rise of the Tomb Raider (2015) is the sequel to the excellent Tomb Raider (2013) reboot. I personally find both refreshing as they move away from the stagnating original series and retell the Croft story. The game is story focused and, like its prequel, offers enjoyable crafting, hunting and climbing/exploring mechanics.

Tomb Raider used the Crystal Engine, developed by Crystal Dynamics also used in Deus Ex: Human Revolution. For the sequel a new engine called Foundation was used, previously developed for Lara Croft and the Temple of Osiris (2014). Its rendering can be broadly classified as a tiled light-prepass engine, and we’ll see what that means as we dive in. The engine offers the choice between a DX11 and DX12 renderer; I chose the latter for reasons we’ll see later. I used Renderdoc 1.2 to capture the frame, on a Geforce 980 Ti, and turned on all the bells and whistles.

## The Frame

I can safely say without spoilers that in this frame bad guys chase Lara because she’s looking for an artifact they’re looking for too, a conflict of interest that absolutely must be resolved using weapons. Lara is inside the enemy base at nighttime. I chose a frame with atmospheric and contrasty lighting where the engine can show off.

#### Depth Prepass

A customary optimization in many games, a small depth prepass takes place here (~100 draw calls). The game renders the biggest objects (rather the ones that take up the most screen space), to take advantage of the Early-Z capability of GPUs. A concise article by Intel explains further. In short, the GPU can avoid running a pixel shader if it can determine it’s occluded behind a previous pixel. It’s a relatively cheap pass that will pre-populate the Z-buffer with depth.

An interesting thing I found is a level of detail (LOD) technique called ‘fizzle’ or ‘checkerboard’. It’s a common way to fade objects in and out at a distance, either to later replace it with a lower quality mesh or to completely make it disappear. Take a look at this truck. It seems to be rendering twice, but in reality it’s rendering a high LOD and a low LOD at the same position, each rendering to the pixels the other is not rendering to. The first LOD is 182226 vertices, whereas the second LOD is 47250. They’re visually indistinguishable at a distance, and yet one is 3 times cheaper. In this frame, LOD 0 has almost disappeared while LOD 1 is almost fully rendered. Once LOD 0 completely disappears, only LOD 1 will render.

A pseudorandom texture and a probability factor allow us to discard pixels that don’t pass a threshold. You can see this texture used in ROTR. You might be asking yourself why not use alpha blending. There are many disadvantages to alpha blending over fizzle fading.

1. Depth prepass-friendly: By rendering it like an opaque object and puncturing holes, we can still render into the prepass and take advantage of early-z. Alpha blended objects don’t render into the depth buffer this early due to sorting issues.
2. Needs extra shader(s): If you have a deferred renderer, your opaque shader doesn’t do any lighting. You need a separate variant that does if you’re going to swap an opaque object for a transparent one. Aside from the memory/complexity cost of having at least an extra shader for all opaque objects, they need to be accurate to avoid popping. There are many reasons why this is hard, but it boils down to the fact they’re now rendering through a different code path.
3. More overdraw: Alpha blending can produce more overdraw and depending on the complexity of your objects you might find yourself paying a large bandwidth cost for LOD fading.
4. Z-fighting: z-fighting is the flickering effect when two polygons render to a very similar depth such that floating point imprecision causes them to “take turns” to render. If we render two consecutive LODs by fading one out and the next one in, they might z-fight since they’re so close together. There are ways around it like biasing one over the other but it gets tricky.
5. Z-buffer effects: Many effects like SSAO rely on the depth buffer. If we render transparent objects at the end of the pipeline when ambient occlusion has run already, we won’t be able to factor them in.

One disadvantage of this technique is that it can look worse than alpha fading, but a good noise pattern, post-fizzle blurring or temporal AA can hide it to a large extent. ROTR doesn’t do anything fancy in this respect.

#### Normals Pass

Crystal Dynamics uses a relatively unusual lighting scheme for its games that we’ll describe in the lighting pass. For now suffice it to say that there is no G-Buffer pass, at least not in the sense that other games have us accustomed to. Instead, the objects in this pass only output depth and normals information. Normals are written to an RGBA16_SNORM render target in world space. As a curiosity, this engine uses Z-up as opposed to Y-up which is what I see more often in other engines/modelling packages. The alpha channel contains glossiness, which will be decompressed later as exp2(glossiness * 12 + 1.0). The glossiness value can actually be negative, as the sign is used as a flag to indicate whether a surface is metallic or not. You can almost spot it yourself, as the darker colors in the alpha channel are all metallic objects.

 R G B A Normal.x Normal.y Normal.z Glossiness + Metalness

## A real life pinhole camera

When I got married last year, me and my wife went on our honeymoon to Thailand. Their king Bhumibol had died just a month ago and the whole country was mourning, so everywhere we found memorials and good wishes for their king, and people would dress in black and white as a sign of sorrow. The Thai are a gentle and polite people, who like to help out; we’d ask for directions and people with no notions of English would spend twenty minutes trying to understand and answer our questions. Thailand has a rich history of rising and falling kingdoms, great kings and battles, and unification and invasions by foreign kingdoms. There are some amazing ruins of these kingdoms. Thailand also lives by a variant of Buddhism reflected in all of their beautiful temples. Some of the architectural features I found most interesting are the small reflective tiles that cover the outer walls, animal motives like the Garuda, (bird creatures that can be seen on the rooftops) and snake-like creatures called Naga It is in this unexpected context that I found a real-life pinhole camera. I always wear my graphics hat so I decided to capture it and later make a post.

First, a little background. A pinhole camera (also known as camera obscura after its latin name) is essentially the simplest camera you can come up with. If you conceptually imagine a closed box that has a single, minuscule hole in one of its faces, such that a single ray from each direction can come inside, you’d have a mirrored image at the inner face of the other side of the box to where the pinhole is. An image is worth more than a thousand explanations, so here’s what I’m talking about.

As you can see, the concept is simple. If you were inside the room, you’d see an inverted image of the outside. The hole is so small the room would be fairly dark so even the faint light now bouncing back towards you would still be visible. I made the pinhole a hexagon, as I wanted to suggest the fact that it is effectively the shutter of a modern camera. Louis Daguerre, one of the fathers of photography, used this model in his famous daguerreotype circa 1835, but Leonardo da Vinci had already described this phenomenon as an oculus artificialis (artificial eye) in one of his works in as early as 1502. There are plenty additional resources if you’re interested and even a pretty cool tutorial on how to create your own.

Now that we understand what this camera is, let’s look at the real image I encountered. I’ve aligned the inside and outside images I took and cast rays so you can see what I mean.

The image of the inside looks bright but I had to take it with 1 second of exposure and it still looks relatively dark. On top of that the day outside was very sunny which helped a lot in getting a clear “photograph”.

## The Rendering of Middle Earth: Shadow of Mordor

Middle Earth: Shadow of Mordor was released in 2014. The game itself was a great surprise, and the fact that it was a spin-off within the storyline of the Lord of the Rings universe was quite unusual and it’s something I enjoyed. The game was a great success, and at the time of writing, Monolith has already released the sequel, Shadow of War. The game’s graphics are beautiful, especially considering it was a cross-generation game and was also released on Xbox 360 and PS3. The PC version is quite polished and features a few extra graphical options and hi-resolution texture packs that make it shine.

The game uses a relatively modern deferred DX11 renderer. I used Renderdoc to delve into the game’s rendering techniques. I used the highest possible graphical settings (ultra) and enabled all the bells and whistles like order-independent transparency, tessellation, screen-space occlusion and the different motion blurs.

## The Frame

This is the frame we’ll be analyzing. We’re at the top of a wooden scaffolding in the Udun region. Shadow of Mordor has similar mechanics to games like Assassin’s Creed where you can climb buildings and towers and enjoy some beautiful digital scenery from them.

#### Depth Prepass

The first ~140 draw calls perform a quick prepass to render the biggest elements of the terrain and buildings into the depth buffer. Most things don’t end up appearing in this prepass, but it helps when you’ve got a very big number of draw calls and a far range of view. Interestingly the character, who is always in front and takes a decent amount of screen space, does not go into the prepass. As is common for many open world games, the game employs reverse z, a technique that maps the near plane to 1.0 and far plane to 0.0 for increased precision at great distances and to prevent z-fighting. You can read more about z-buffer precision here.

#### G-buffer

Right after that, the G-Buffer pass begins, with around ~2700 draw calls. If you’ve read my previous analysis for Castlevania: Lords of Shadow 2 or have read other similar articles, you’ll be familiar with this pass. Surface properties are written to a set of buffers that are read later on by lighting passes to compute its response to the light. Shadow of Mordor uses a classical deferred renderer, but uses a comparably small amount of G-buffer render targets (3) to achieve its objective. Just for comparison, Unreal Engine uses between 5 and 6 buffers in this pass. The G-buffer layout is as follows:

##### Normals Buffer
 R G B A Normal.x Normal.y Normal.z ID

The normals buffer stores the normals in world space, in 8-bit per channel format. This is a little bit tight, sometimes not enough to accurately represent smoothly varying flat surfaces, as can be seen in some puddles throughout the game if paying close attention. The alpha channel is used as an ID that marks different types of objects. Some that I’ve found correspond to a character (255), an animated plant or flag (128), and the sky is marked with ID 1, as it’s later used to filter it out during the bloom phase (it gets its own radial bloom).

## Photoshop Blend Modes Without Backbuffer Copy

For the past couple of weeks, I have been trying to replicate the Photoshop blend modes in Unity. It is no easy task; despite the advances of modern graphics hardware, the blend unit still resists being programmable and will probably remain fixed for some time. Some OpenGL ES extensions implement this functionality, but most hardware and APIs don’t. So what options do we have?

### 1) Backbuffer copy

A common approach is to copy the entire backbuffer before doing the blending. This is what Unity does. After that it’s trivial to implement any blending you want in shader code. The obvious problem with this approach is that you need to do a full backbuffer copy before you do the blending operation. There are certainly some possible optimizations like only copying what you need to a smaller texture of some sort, but it gets complicated once you have many objects using blend modes. You can also do just a single backbuffer copy and re-use it, but then you can’t stack different blended objects on top of each other. In Unity, this is done via a GrabPass. It is the approach used by the Blend Modes plugin.

### 2) Leveraging the Blend Unit

Modern GPUs have a little unit at the end of the graphics pipeline called the Output Merger. It’s the hardware responsible for getting the output of a pixel shader and blending it with the backbuffer. It’s not programmable, as to do so has quite a lot of complications (you can read about it here) so current GPUs don’t have one.

The blend mode formulas were obtained here and here. Use it as reference to compare it with what I provide. There are many other sources. One thing I’ve noticed is that provided formulas often neglect to mention that Photoshop actually uses modified formulas and clamps quantities in a different manner, especially when dealing with alpha. Gimp does the same. This is my experience recreating the Photoshop blend modes exclusively using a combination of blend unit and shaders. The first few blend modes are simple, but as we progress we’ll have to resort to more and more tricks to get what we want.

Two caveats before we start. First off, Photoshop blend modes do their blending in sRGB space, which means if you do them in linear space they will look wrong. Generally this isn’t a problem, but due to the amount of trickery we’ll be doing for these blend modes, many of the values need to go beyond the 0 – 1 range, which means we need an HDR buffer to do the calculations. Unity can do this by setting the camera to be HDR in the camera settings, and also setting Gamma for the color space in the Player Settings. This is clearly undesirable if you do your lighting calculations in linear space. In a custom engine you would probably be able to set this up in a different manner (to allow for linear lighting).

#### A) Darken

Formulamin(SrcColor, DstColor)
Blend UnitMin(SrcColor · One, DstColor · One)

As alpha approaches 0, we need to tend the minimum value to DstColor, by forcing SrcColor to be the maximum possible color float3(1, 1, 1)

#### B) Multiply

FormulaSrcColor · DstColor
Blend UnitSrcColor · DstColor + DstColor · OneMinusSrcAlpha

## The Rendering of Castlevania: Lords of Shadow 2

Castlevania Lords of Shadow 2 was released in 2014, a sequel that builds on top of Lords of Shadow, its first installment, which uses a similar engine. I hold these games dear and, being Spanish myself, I’m very proud of the work MercurySteam, a team from Madrid, did on all three modern reinterpretations of the Castlevania series (Lords of Shadow, Mirror of Fate and Lords of Shadow 2). Out of curiosity and pure fandom for the game I decided to peek into the Mercury Engine. Despite the first Lords of Shadow being, without shadow of a doubt (no pun intended), the best and most enjoyable of the new Castlevanias, out of justice for their hard work I decided to analyze a frame from their latest and most polished version of the engine. Despite being a recent game, it uses DX9 as graphics backend. Many popular tools like RenderDoc or the newest tools by Nvidia and AMD don’t support DX9, so I used Intel Graphics Analyzer to capture and analyze all the images and code from this post. While having a bit of graphics parlance, I’ve tried to include as many images as possible, with occasional code and in-depth explanations.

# Analyzing a Frame

This is the frame we’re going to be looking at. It’s the beginning scene of Lords of Shadow 2, Dracula has just awakened, enemies are knocking at his door and he is not in the best mood.

## Depth Pre-pass

LoS2 appears to do what is called a depth pre-pass. What it means is you send the geometry once through the pipeline with very simple shaders, and pre-emptively populate the depth buffer. This is useful for the next pass (Gbuffer), as it attempts to avoid overdraw, so pixels with a depth value higher than the one already in the buffer (essentially, pixels that are behind) get discarded before they run the pixel shader, therefore minimizing pixel shader runs at the cost of extra geometry processing. Alpha tested geometry, like hair and a rug with holes, are also included in the pre-pass. LoS2 uses both the standard depth buffer and a depth-as-color buffer to be able to sample the depth buffer as a texture in a later stage.

The game also takes the opportunity to fill in the stencil buffer, an auxiliary buffer that is part of the depth buffer, and generally contains masks for pixel selection. I haven’t thoroughly investigated why precisely all these elements are marked, but for instance was presents higher subsurface scattering and hair and skin have its own shading, independent of the main lighting pass, which stencil allows to ignore.

• Dracula: 85
• Hair, skin and leather: 86
• Window glass/blood/dripping wax: 133
• Candles: 21

The first image below shows what the overdraw is like for this scene. A depth pre-pass helps if you have a lot of overdraw. The second image is the stencil buffer.

## GBuffer Pass

LoS2 uses a deferred pipeline, fully populating 4 G-Buffers. 4 buffers is quite big for a game that was released on Xbox360 and PS3, other games get away with 3 by using several optimizations.

#### Normals (in World Space):

 normal.r normal.g normal.b sss

The normal buffer is populated with the three components of the world space normal and a subsurface scattering term for hair and wax (interestingly not skin). Opaque objects only transform their normal from tangent space to world space, but hair uses some form of normal shifting to give it anisotropic properties.

#### Albedo:

 albedo.r albedo.g albedo.b alpha * AOLevels

The albedo buffer stores all three albedo components plus an ambient occlusion term that is stored per vertex in the alpha channel of the vertex color and is modulated by an AO constant (which I presume depends on the general lighting of the scene).

#### Specular:

 specular.r specular.g specular.b Fresnel multiplier

The specular buffer stores the specular color multiplied by a fresnel term that depends on the view and normal vectors. Although LoS2 does not use physically-based rendering, it includes a Fresnel term probably inspired in part by the Schlick approximation to try and brighten things up at glancing angles. It is not strictly correct, as it is done independently of the real-time lights. The Fresnel factor is also stored in the w component.