my writeup - mikejsavage.co.uk
Transcription
my writeup - mikejsavage.co.uk
GPU Path Tracing 3rd Year Computer Science Project Michael Savage 2014 Contents 1 Introduction 3 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Overview of ray tracing . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Overview of path tracing . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Accomplishments . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Ray Tracing in Depth 2.1 8 Setting up the camera . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.1 Calculating the camera’s coordinate system . . . . . . . . 8 2.1.2 Building the focal plane . . . . . . . . . . . . . . . . . . . 9 2.2 Tracing rays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.1 Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.2 Spheres . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4 Anti-aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Ray Tracing in CUDA 16 17 3.1 A brief overview of CUDA . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Implementing a ray tracer in CUDA . . . . . . . . . . . . . . . . 18 3.3 Limitations of CUDA 19 . . . . . . . . . . . . . . . . . . . . . . . . 1 4 Path Tracing in Depth 4.1 22 The rendering equation . . . . . . . . . . . . . . . . . . . . . . . 22 4.1.1 BRDFs and BTDFs . . . . . . . . . . . . . . . . . . . . . 23 4.2 Monte Carlo integration . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Monte Carlo path tracing . . . . . . . . . . . . . . . . . . . . . . 25 4.3.1 26 Reflectance models . . . . . . . . . . . . . . . . . . . . . . 5 Path Tracing in Rust 29 5.1 A brief overview of Rust . . . . . . . . . . . . . . . . . . . . . . . 29 5.2 Implementing a path tracer in Rust . . . . . . . . . . . . . . . . . 30 5.2.1 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.2.2 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.2.3 Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.2.4 Worlds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.2.5 Piecing everything together . . . . . . . . . . . . . . . . . 33 6 Conclusion 6.1 6.2 6.3 37 Performance and evaluation . . . . . . . . . . . . . . . . . . . . . 37 6.1.1 Linearity with number of threads . . . . . . . . . . . . . . 37 6.1.2 Noise reduction with number of samples . . . . . . . . . . 38 6.1.3 Adjusting the recursion limit . . . . . . . . . . . . . . . . 38 6.1.4 Tweaking the light parameters . . . . . . . . . . . . . . . 39 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 6.2.1 Russian Roulette path termination . . . . . . . . . . . . . 40 6.2.2 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 6.2.3 Acceleration structures . . . . . . . . . . . . . . . . . . . . 40 6.2.4 Do away with pixels . . . . . . . . . . . . . . . . . . . . . 41 6.2.5 Better integration methods . . . . . . . . . . . . . . . . . 41 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 References 43 2 Chapter 1 Introduction 1.1 Motivation Physically based renderers are pieces of software which aim to produce photorealistic images by using physically accurate models of how light and materials in the real world work. They have historically been too computationally intensive to use in real-time applications, instead being applied in offline rendering situations where time is not so much of an issue. Examples of such problems could be movie making or architecture, where waiting an hour per frame in intensive scenes is acceptable. Even in situations like this, however, the software involved tends to be based on hybrid methods, using both typical rasterisation and physically based methods. An example of such a renderer would be Pixar’s Renderman1 . Things are starting to look up though, as physically based rendering methods tend to be embarrassingly parallel (as you will see later in this paper) and with computers based on, or at least with components based on, massively parallel2 architectures becoming increasingly commonplace, it’s more and more feasible to use purely physically based renderers. The most commonly found massively parallel architecture today is in graphics process units (GPUs). As the name implies, they are specialised hardware which excel at the kind of number crunching graphics workloads require and have hundreds of floating point units. Up until recently, they have been limited to running functions exposed through the graphics library (OpenGL3 /DirectX4 ) and running simple programs called shaders, which are restricted to operating on pixels or vertices. While general-purpose programming was possible, it wasn’t 1 http://renderman.pixar.com/view/renderman 2 Computing using a large number of processors/machines 3 http://www.opengl.org/ 4 http://en.wikipedia.org/wiki/DirectX 3 the intended use of the hardware and programs required a lot of shoehorning to fit into the graphics library’s model. Today we are seeing the rise of GPGPU – general-purpose computing on graphics processing units – with the major GPU manufacturers, NVIDIA and AMD, providing frameworks for running less restricted code on their hardware. AMD and Apple are pushing OpenCL5 , a general-purpose, open source framework for parallel computation, while NVIDIA are pushing their proprietary GPU framework, CUDA6 . The latter, being more specialised, provides a lower level interface and given that most GPGPU applications are written with performance in mind, has led to it seeing greater success than OpenCL. We are even beginning to see real-time global illumination work its way into modern renderers, such as Unity7 , CryeENGINE8 and Octane/Brigade9 . The first two are primarily designed as game engines, with Unity using the Enlighten10 library and CryENGINE using a technique called Light Propagation Volumes(Anton Kaplanyan 2009), which to my knowledge does not handle certain physical effects such as refraction and caustics. Octane is designed to be a real-time, physically accurate renderer and uses path tracing. I have chosen to investigate methods similar to Octane’s as path tracing is a natural extension of ray tracing, which I’m already familiar with and accurate images are more interesting than “good enough” ones. 1.2 Overview of ray tracing Ray tracing is a method for rendering scenes by simulating how light rays bounce from light sources, to objects in the scene, to the viewer’s eyes. There are many advantages to using ray tracing over rasterisation. Ray tracing is conceptually simple and easy to understand. It can simulate many optical effects that rasterisation is unable to handle, such as reflection, refraction and caustics, or lens effects like depth of field. The major disadvantage of ray tracing is that it’s too slow to be used in real-time applications. Let’s start with a few definitions: ray: A line extending to infinity from a point in one direction. Can be expressed mathematically as: R(t) = p + td, where t ≥ 0, p is the point it originates from and d is the direction it extends in. focal plane: A plane placed between the focal point of the camera and the scene, which the image gets drawn onto 5 https://www.khronos.org/opencl/ 6 http://www.nvidia.co.uk/object/cuda_home_new.html 7 http://unity3d.com/ 8 http://www.crytek.com/home 9 http://render.otoy.com/ 10 http://www.geomerics.com/enlighten/ 4 irradiance: Irradiance is the flux per unit area incident on a surface. Or more simply, the intensity of the incoming light. Ray tracing works by firing rays for each pixel from the focal point, through the pixel’s corresponding location on the focal plane and into the scene. Upon intersection with an object, the algorithm traces additional rays towards the lights in the scene to determine if they are visible from the intersection point. This is done because ray tracing makes the assumption that only visible lights contribute to the irradiance at a point. Finally, the algorithm may decide to shoot even more rays back into the scene depending on the material properties at the intersection point. For example, if the object is a mirror, it will cast another ray back into the scene in the direction of the reflection, from the intersection point. Similarly, if the object is not opaque, we should cast a refraction ray into the object. \begin{figure}[htbp] \caption[]% {An image explaining ray tracing, taken from Wikipedia11 } \end{figure} As ray tracing assumes that only visible lights make any lighting contribution, ray tracing can be said to only account for direct illumination. Picture an outdoor scene with a wall in an open area, and only the sun providing illuminance. We can say from experience of being in the real world that it’s unlikely the area shadowed by the wall would be pitch black. To account for this, ray tracers commonly enforce a constant ambient term for lighting in the 11 http://en.wikipedia.org/wiki/File:Ray_trace_diagram.svg 5 scene. This might produce acceptable results in my contrived example, but it’s clearly not physically accurate and is sidestepping the actual problem. The mathematics behind ray tracing and its implementation are covered in more detail in chapter 2. 1.3 Overview of path tracing As mentioned above, ray tracing only considers direct illumination in a scene. Even with the addition of an ambient term, this can produce incorrect results in simple scenes. For example: mirrored wall Figure 1.1: A scene that ray tracing performs poorly on. Above is an image depicting a room with a wall separating both sides, except for a gap at the top. The entire top wall of the room is a mirror and a light source is positioned to one side. In the real world, the entire room would be lit as the reflection of the light in the mirror would provide illuminance to parts of the room that can’t see the light directly (shaded in the above image). In a ray traced scene however, only the non-shaded area, where the light is directly visible, would receive illumination from the light source. Conversely, path tracing is an algorithm that tries to render all scenes correctly by evaluating the global illumination in a scene. Path tracing drops the assumption that direct lighting is the only source of illumination in a scene, and instead adopts the key idea that there should be no difference between light emitted from a light source and light reflected from a surface or scattered by participating media. Upon impact with any surface, including diffuse materials, we spawn secondary rays and cast them back into the scene. If these rays encounter a light source, we can transfer illuminance down the path to the eye. This gives us the effect that every surface the path interacts with affect the throughput and therefore the total irradiance. 6 Since the tree of all possible light paths is both infinitely wide and infinitely deep, we need to intelligently sample paths to get an efficient implementation. I explain the mathematics of Monte Carlo integration and its application to path tracing in chapter 4. 1.4 Accomplishments In my time working on this project, I built a ray tracing renderer in CUDA and outlined a path tracing renderer which would also be implementable in CUDA. I have elaborated on the implementation and its limitations later in chapter 3. Due to the limitations of CUDA, I decided it would be a more efficient use of my time to implement that path tracing algorithm in CPU code. To keep in the spirit of this project, I paid special attention to which language features would and, more importantly, would not be available when writing GPU code. I have gone into more detail on both the path tracing algorithm and my implementation in chapters 3 and 4. 7 Chapter 2 Ray Tracing in Depth In this chapter, I will expand on my earlier explanation of ray tracing to include the mathematics and details relevant to implementing a ray tracing renderer. 2.1 Setting up the camera To begin, we need a camera. Specifically, given the focal point/eye, the focal length, the field of view1 (FOV), the aspect ratio2 of the output image, the view direction or a point to look at, and an up vector, we can compute the coordinates of pixels on the focal plane and the camera’s coordinate system. 2.1.1 Calculating the camera’s coordinate system Suppose we are given the world’s up direction – which need not be the up direction of the camera – and the direction we want the camera to point in. If we are given a point to look at instead of a view direction, we can calculate the view direction by subtracting the focal point from the point we want to look at. From here on I will denote the camera’s forward, left and up vectors with cf , cl and cu respectively, and the world up vector with u. I will also assume that cf and u are normalised3 . 1 The FOV of the camera is the angle between the lines from the focal point to the top/bottom of the focal plane. I have drawn an example in Figure 2.1. 2 The aspect ratio is the ratio between its width and height. Again see Figure 2.1. 3 A vector is normalised when it has length 1. A vector can therefore be normalised by dividing it by its length. 8 We can compute the camera’s left and up vectors as follows: uw × cf |uw × f | cu = cf × cl l= Note cu is automatically of length one as cf and cl are orthogonal. The calculation of cl degenerates and gives a zero-length result when u = −cf , i.e. the camera is looking directly down. 2.1.2 Building the focal plane The inputs we need to build the focal plane are: the coordinate system we just calculated, the vertical FOV and aspect ratio of the camera and the focal length. side view rear view cu cu focal plane θ cl cf l aspect ratio = width / height Figure 2.1: A diagram of the camera parameters. The focal point is at the origin. In the above diagram, the camera’s vertical FOV is denoted by θ and the focal length is denoted by l. To begin, we need to compute the actual dimensions of the focal plane. This can be done with some trigonometry. Let r denote the camera’s aspect ratio, fh be half the height of the focal plane and fw be half its width. Using half lengths makes the extents calculations a little simpler and we won’t be using the full lengths. 9 We can then compute the dimensions as follows: fh = 2l tan θ 2 fw = rfh Additionally, we need the extents of the focal plane. Let ftl , ftr and fbr be the points at the top left, top right and bottom right corners of the focal plane respectively. We can find the top left corner with: ftl = o + lcf + fh cu + fl cl where o is the focal point of the camera. The other corners are analogous and omitted. Given the three corners, it’s easy to linearly interpolate between them to find the centre coordinates of each pixel on the focal plane. 2.2 Tracing rays The next job of a ray tracer is to trace rays. We begin by firing rays from the focal point (o) into the scene through their corresponding point/pixel on the focal plane. The pseudocode for this looks like: for x in [0, width) { for y in [0, height) { let p = CentreOfPixel( x, y ) let ray = Normalise( p - o ) } } pixels[ x, y ] = Irradiance( o, ray ) The equation for finding the world coordinates of a pixel given its x and y coordinates looks like: x y , ty = width height = ftl (1 − tx ) + ftr tx + ty (fbr − ftr ) let tx = px,y After we have spawned the rays, we need to intersect them against objects in the scene. The implementation of Irradiance would look like: 10 Irradiance( start, dir ) { // hit is an object containing information // about the intersection let hit = Intersect( start, dir ) if hit == null { return black } let ret = black for l in lights { let shadow_ray = Normalise( l.pos let offset = l.pos + shadow_ray * if DoesIntersect( offset, l.pos ret += l.colour * DotProduct( } } - hit.pos ) EPSILON hit.pos ) { shadow_ray, hit.normal ) if object hit is reflective { let r = Reflect( dir, hit.normal ) ret += Irradiance( intersection.pos + r * EPSILON, r ) } // also spawn a secondary ray for refractive materials } return ret There are a few parts of the above code that I haven’t yet explained. The first is the EPSILON term when casting secondary rays. This is required because of floating point imprecision. The intersection point of a ray and an object might lie on the opposite side of the object’s face. When the outgoing shadow or reflection ray is fired from this point, it immediately intersects with the same object which is not what we want. EPSILON can be bounded using formal analysis of floating point arithmetic (Matt Pharr 2010, 112) if you are implementing a serious renderer, but for our intentions it’s fine to just define it as a small value and it won’t produce any noticeable effect on the resulting image. The second thing that needs explaining is the scaling by DotProduct(...) when adding the light source’s contribution. This is a result of Lambert’s cosine law, which says the intensity of light reflected off a diffuse surface is proportional to the cosine of the angle between the incident light and the surface normal. I have given a diagram (Figure 2.2) which should make it clear what is happening. As an aside, this is also one of the reasons why the Earth is hotter at the equator than the poles. It’s fitting that Lambert was also an astronomer. 11 dA dA Figure 2.2: Lambert’s cosine law. Light incident at a shallow angle is spread over a larger area than light arriving perpendicular to the surface and therefore the irradiance is reduced. Next, the implementation of Reflect. This is a standard result in graphics so I won’t be including a full derivation, instead I will just give the equation. Suppose we are given a surface normal n and the direction an incoming ray is coming from4 , v. We then compute the reflected ray direction r as follows: r = −v + 2 (n · v) n Lastly, the implementation of Intersect. A naive implementation iterates over every object in the scene and intersects the ray with each of them, keeping track of the nearest intersection. This can be improved by introducing an object hierarchy of some kind, for example a BSP5 /k-d tree6 , or bounding volume hierarchy7 . The implementation of DoesIntersect takes the start point of the ray and a non-normalised ray direction and returns a boolean indicating whether there was an intersection along the length of the ray. It can also call separate predicate intersection tests on primitives, as these can often be made faster than methods that actually need to compute the intersection point. This requires us to have intersection routines for each kind of primitive we expect to see in the scene, which is what I will cover, amongst other things, in this next section. 4 Note this is the negative of the direction of the ray. This is simply convention and makes certain calculations a little more intuitive. 5 http://en.wikipedia.org/wiki/Binary_space_partitioning 6 http://en.wikipedia.org/wiki/K-d_tree 7 http://en.wikipedia.org/wiki/Bounding_volume_hierarchy 12 2.3 Shapes For the sake of simplicity, I only implemented planes and spheres. With proper abstraction, it should be easy to add more shapes to the renderer. 2.3.1 Planes We will parameterise an infinite plane by its normal and perpendicular distance from the origin. Or more formally, a point X lies on the plane with normal n and perpendicular distance d when: X·n=d Also see Figure 2.3 for a digram. n d 0 Figure 2.3: A plane and its parameters. 2.3.1.1 Intersection Given a ray parameterised by R(t) = p + td (recall the definition from Section 1.2), we can compute the value of t at the intersection point between it and a plane with the following: (p + td) · n = d p · n + td · n = d d−p·n t= d·n And check whether t ≥ 0. We also need to handle the case where the ray is parallel to the plane and doesn’t lie along it, i.e. d · n = 0 and p · n 6= d. 13 Interestingly enough, we can choose to handle the degenerate case by not handling it, in that a division by zero produces +∞. Note that the possibility of it producing −∞ is ruled out by the t ≥ 0 test. This works because the Intersect routine only keeps track of the nearest intersection. 2.3.1.2 Surface normal The surface normal at any point on a plane is trivially n. 2.3.1.3 UV coordinates Any two dimensional surface can have every position on it represented by a pair of coordinates. These are conventionally called u and v, hence UV coordinates. For a plane, the logical UV parameterisation is to introduce orthogonal u and v axes in the plane, like x and y coordinates in 2D cartesian space. Therefore, to compute the UV coordinates we should transform the plane so it lies parallel to the XY plane. Then we take the x and y coordinates of our point as u and v respectively. One way to think about this is to picture the transformation as rotating the plane so its normal becomes k8 . Therefore, we need a method to construct a transformation matrix that represents the rotation required to line up a pair of direction vectors, i.e. given a pair of unit vectors n and n0 , we should construct a matrix M that satisfies M n = n0 . I will also denote the angle between them as θ, although this is never explicitly computed. The steps involved in finding this matrix are: 1. Take the cross product, n × n0 and normalise to obtain the axis of rotation, a. 2. Compute n · n0 = cos θ. 3. Construct the matrix corresponding to a rotation of θ about a. I have omitted the maths involved in building a rotation matrix for an arbitrary axis as they are a standard result in computer graphics. Nevertheless, observe that it only involves sin θ and cos θ and never just θ, so we can use the identity sin2 θ + cos2 θ = 1 as an optimisation. The first step degenerates when the supplied vectors are parallel. We should explicitly handle two cases: • n = n0 : Return the identity matrix. • n = −n0 : Return the negative identity matrix. 8 i, j and k form the basis of the 3D cartesian coordinate system, being unit vectors in the direction of the x, y and z axes respectively. 14 2.3.2 Spheres A sphere is parameterised by its centre c and its radius r. Again more formally, a point X lies on the surface of the sphere if it satisfies: |X − c| = r 2.3.2.1 Intersection When intersecting with solid objects such as spheres, we treat the inside as hollow, i.e. if a ray starts inside the object, we still consider the intersection point to lie on the surface and not where t = 0. This makes it possible to implement specular transmission9 . Following the derivation from (Christer Ericson 2004, 177): If we substitute X for the ray equation and square both sides of the above and use the identity |u|2 = u · u and let m = p − c, we can derive a quadratic equation in t: (m + t · d) · (m + t · d) = r2 (d · d) t2 + 2 (m · d) t + (m · m) = r2 t2 + 2 (m · d) t + (m · m) − r2 = 0 There are two cases we need to handle with this: Let d be the discriminant d = b2 − c, where b = 2 (m · d) and c = (m · m) − r2 . • d < 0: The quadratic has no real solutions, i.e. the ray missed the sphere. √ • d ≥ 0: The roots of the equation are given by t = −b ± b2 − c10 . We want the smaller t that is still positive. If both roots are negative, we have a situation where the ray started outside the sphere and was pointing away from it. 9 The 10 As refraction of light through non-opaque a surface. the quadratic is of the form x2 + 2bx + c = 0, we can simplify the quadratic equation a little. 15 2.3.2.2 Surface normal If we make the assumption that the supplied point, X, lies on the surface, we can skip the sqrt involved in computing the length of the vector by instead taking the length to be r. n= 2.3.2.3 X−c r UV coordinates A possible UV mapping for a sphere would be to convert cartesian coordinates to spherical coordinates11 denoted by the tuple (r, θ, φ), but without r as it’s unnecessary and with θ and φ scaled to lie in the interval [0, 1]. Again this is a standard transformation so I won’t be including it here. 2.4 Anti-aliasing If we implemented a ray tracer as described, we would end up with hard edges between objects, shadows, reflections of objects and shadows, and so on. This is ugly. In practice we apply a technique called anti-aliasing to reduce the artifacts around edges. Figure 2.4: A before and after image showing the effect of anti-aliasing in my path tracing renderer (Chapter 5). Anti-aliasing can be performed in many ways, with techniques such as fast approximate anti-aliasing 12 (FXAA) and multisample anti-aliasing 13 (MSAA) being the common approaches to anti-aliasing rasterised graphics. With a physically based renderer, we can get more accurate anti-aliasing by applying small random adjustments to the ray when we cast it from the camera and into the scene. We repeat this process as many times as necessary and take the mean value across all the samples. Again this is not the only way of reducing aliasing, and is probably not even the best as discussed in Section 6.2.4. 11 http://en.wikipedia.org/wiki/Spherical_coordinate_system 12 http://en.wikipedia.org/wiki/Fast_approximate_anti-aliasing 13 http://en.wikipedia.org/wiki/Multisample_anti-aliasing 16 Chapter 3 Ray Tracing in CUDA In this chapter, I will briefly explain the architecture behind CUDA and talk a bit about the implementation of the ray tracer I wrote in CUDA. I will finish by discussing some of the limitations I encountered when writing my implementation, which is my justification for switching to CPU code for the remainder of my project. 3.1 A brief overview of CUDA CUDA is a massively parallel architecture that allows a subset of C/C++ code to be run on a GPU. Since modern GPUs have many hundreds of processors, they lend themselves well to executing algorithms which benefit from parallelism without requiring much, or any, data to be shared between them. In ray tracing, rays fired from the camera don’t rely on information from any other rays which are also being cast into the scene. This gives us an obvious and very clean separation of work between threads, in which we assign each thread to have complete ownership of one or more pixels. This model can scale to over a million processors depending on the output image’s resolution. CUDA also supports enough language features to make implementing a ray tracer possible. To be specific, the only feature we really need is shared memory arrays that all the threads can write colour data to, which is provided by cudaMalloc. Having recursion – available since CUDA 3.1 – also makes implementing the algorithms a little easier, but isn’t required as the algorithms can be reworked to use tail recursive, and therefore non-recursive, calls without much effort. NVIDIA also provide a library for generating pseudo-random numbers in device code called cuRAND1 . Pseudo-random numbers are necessary for sampling 1 https://developer.nvidia.com/curand 17 methods described in Chapter 4, and not having to implement our own pseudo-random number source is a plus. 3.2 Implementing a ray tracer in CUDA The first problem to overcome was how to deliver work to each thread. My original idea was to have a pair of queues for each thread, with threads reading work from one and adding work for the next frame to the other. This model would have potentially come in handy when I moved onto writing a path tracing renderer as it would allow me to spawn many secondary rays from each intersection. Sadly, CUDA doesn’t allow dynamic allocations to be made in GPU code which raised questions like “what should the maximum length of each queue be?”. This would most likely have required a lot of trial and error to answer optimally. While searching for a replacement model for delivering work, I realised there is no need to queue arbitrary amounts of work and we only need to spawn one secondary ray for each intersection. Some examples of an algorithm like this can be found in smallpt2 and (Dietger van Antwerpen 2010). I started by using Sean Barret’s stb_image_write.h3 for writing images, which required me to copy pixel data from the GPU. Moving data across the main memory/GPU divide is expensive, but as it only happened once it didn’t really matter. I wanted the renderer to be making samples continuously as long as it was executing, constantly displaying its progress at regular intervals. I felt it would be more interesting and would help me diagnose errors in my code. In theory CUDA makes this easy to do by allowing you to write to a framebuffer object (FBO) and then treating that as a texture to be painted on a quad across the entire viewport with a single call. CUDA doesn’t allow you to work with an FBO while it is bound in the graphics context, which means I would need to write some kind of synchronisation code to avoid race conditions. I wanted to avoid assigning too many pixels that are difficult to render to the same thread. Depending on the exact implementation, this could have either slowed the renderer down with many idle threads as it waited for the last busy thread to complete its work. It could also have led to a situation where some of the threads were lagging behind the others, which may have lead to artifacts being present in the output image for longer than necessary. I thought the most common cause of overworking a thread would be if I assigned pixels to threads in stripes or blocks. If only a small, localised part of the scene was difficult to render, this would have caused a small number of threads to have the majority of the work. 2 http://www.kevinbeason.com/smallpt/ – specifically the forward.cpp modification. 3 http://nothings.org/stb/stb_image_write.h 18 To get around this, I decided to assign every Nth pixel to each thread. See Figure 3.1 for an example of this mapping scheme with 25 threads. Figure 3.1: Each pixel has been coloured according to which thread it was assigned to. There are 25 separate threads. The CUDA code for initialising the ray tracer and spawning the kernel threads looks like: sphere scene[] = { /* ... */ }; size_t spheres = sizeof( scene ) / sizeof( scene[ 0 ] ); queue* d_work; sphere* d_scene; job** d_work; cudaMalloc( ( void** ) &d_work, sizeof( job* ) * N ); cudaMalloc( ( void** ) &d_scene, sizeof( scene ) ); cudaMemcpy( d_scene, &scene, sizeof( scene ), cudaMemcpyHostToDevice ); // initialise N work queues on the GPU init_work<<< 1, 1 >>>( d_work, N ); // wait for init_work to terminate cudaDeviceSynchronize(); // (omitted) initialise FBO // set d_image to point to the FBO data // spawn N worker threads worker<<< 1, N >>>( d_work, d_scene, spheres, d_image ); cudaDeviceSynchronize(); // (omitted) destroy FBO cudaFree( d_work ); cudaFree( d_scene ); 3.3 Limitations of CUDA In the end, I decided to stop using CUDA because I felt like I was wasting a lot of time Googling to resolve API struggles in order to perform what I felt were basic tasks. 19 Since I am more familiar with C than C++, and I was unsure how much of the C++ object system CUDA actually supports, I wrote my CUDA code in more of a C style. This led to code like: __device__ vec vec_add( const vec u, const vec v ) { return ( vec ) { u.x + v.x, u.y + v.y, u.z + v.z, }; } This is not ideal as operator overloading leads to neater code, but it wasn’t too much of a problem in practice. Another problem that I aware of is that CUDA code is very hard to debug. It requires some non-obvious switches4 on the compiler command line to enable usage of printf, which made things a bit easier but figuring this out felt like time that could be better spent. The CUDA compiler also requires another command line switch5 before it will allow you to supply more than one source file containing kernel code, which I found confusing. Another problem that stalled progress for a few days near the start of my project was that my code would fail and give no indication of why. After some searching and generous sprinkling of debugging code, I managed to coerce an “Out of memory” error out of CUDA, which was only being thrown when I ran my (non-allocating) kernel above a certain number of threads. After a few days of not very enthusiastic debugging, I narrowed the problem down to the following: __global__ void worker( Queue* const queues, ... ) { Queue q = queues[ threadIdx.x ]; } The C specification says that struct/fixed size array assignments are performed by creating a new copy of the data. In my code, queues was a very large array. I had incorrectly assumed when writing this that the compiler would have skipped this copy since I was only ever reading from that queue. Instead this was causing my GPU to run out of memory when I ran it. The fix was to replace the second line with Queue& q = queues[ threadIdx.x ];. 4 -gencode arch=compute_30,code=\"sm_30,compute_30\" 5 -rdc=true 20 Careful readers will have noticed that I said “in theory” before saying CUDA makes it easy to write to FBOs to use as textures. I have provided a screenshot of the output of the final version of my CUDA renderer (Figure 3.1) to elaborate on what I mean. Figure 3.2: The straw that broke the camel’s back. My attempt at rendering from FBOs in the CUDA implementation was unsuccessful. 21 Chapter 4 Path Tracing in Depth In this chapter, I formalise the concept of rendering by introducing the rendering equation. Next, I will introduce Monte Carlo integration as a method for numerically solving integrals we can’t integrate directly. Finally, I shall introduce Monte Carlo path tracing as one possible method for finding a numerical solution to the rendering equation. 4.1 The rendering equation The rendering equation was introduced as a way of mathematically expressing irradiance at a point in space. It is based on the idea of the conservation of energy in that light reflected off a point is equal to the light incident at a point, minus some that is absorbed by the material. Lo (x, ωo ) = Le (x, ωo ) + Z Li (x, ωo )f (x, ωo , ωi ) |cos θ| dωi S2 Where the meaning of the terms is as follows: x: The point we are considering. ωo : The direction of outgoing light we are considering from x. Lo (x, ωo ): The total light output at x in the direction of ωo . L R e (x, ωo ): The emittance at x in the direction of ωo . . . . dωi : The integral of all directions (ωi ) over the 3D unit sphere. S2 f (x, ωo , ωi ): The evaluation of the BRDF/BTDF at x for the given outgoing and incident directions. These terms are used to model the physical response of light to a surface and are explained in the following section. 22 |cos θ|: Recall Lambert’s cosine law from Section 2.2 and Figure 2.2. θ is the angle between the surface normal and ωi . We take the absolute value of the cosine so light arriving at the back of the suface also reflects/transmits correctly. 4.1.1 BRDFs and BTDFs The bidirectional reflectance distribution function (BRDF) of a material is a function that describes how it reflects incoming light. The bidrectional transmission distribution function (BTDF) describes how a surface transmits1 incoming light. They both behave similarly and when referring to a function that could be one or the other, I am going to say BxDF. These functions can be approximations to real materials, or they can be derived from real world measurements. Some examples of the latter kind would be the measured BRDFs from Cornell University’s graphics lab2 , or from the Mitsubishi Electric Research Laboratories BRDF database3 . It is worth mentioning that physically based BxDFs satisfy two properties: • Reciprocity: if you swap the incident and outgoing light directions the result is unchanged, i.e. for any BxDF f , f (x, ωi , ωo ) = f (x, ωi , ωo ) for all ωi and ωo . • Conservation of energy: the total energy of light reflected or transmitted is less or equal to the total energy of incident light, or where θ is the angle between ωi and ωo : Z f (x, ωi , ωo ) cos θ dωi ≤ 1 S2 While perfectly diffuse and perfectly specular materials do not exist in the real world, they are a good base to start from. Perfectly diffuse materials reflect all incoming light evenly across the hemisphere aligned with the surface normal. Perfectly specular materials reflect light according to perfect specular reflection (recall Reflect from section 2.2). I shall provide implementations of BRDFs for these materials later. 4.2 Monte Carlo integration Monte Carlo integration is a numerical method for evaluating integrals that would otherwise be difficult or impossible to solve exactly. 1 Recall that specular transmission means refraction. 2 http://www.graphics.cornell.edu/online/measurements/reflectance/ 3 http://www.merl.com/brdf/ 23 Rb Suppose we want to evaluate the integral a f (x)dx. I shall base the derivation of the Monte Carlo estimator from (Matt Pharr 2010, 642). Start by assuming we are able to draw samples Xi from a distribution with probability distribution function (PDF) p(x). The possible values of the distribution should be bounded and the PDF should be non-zero within those bounds, i.e. for some a and b: x 6∈ [a, b] =⇒ p(x) = 0 x ∈ [a, b] =⇒ p(x) > 0 Next, denote the Monte Carlo estimator given N samples as: FN = N 1 X f (Xi ) N i=1 p(Xi ) We can show that the expected value of the estimator, E [FN ] is equal to the integral. The expected value of a function is defined as: E [FN ] = Z f (x)p(x) dx Next we show, using some standard results from probability, that the expected value of the estimator is in fact equal to the integral. " # N 1 X f (Xi ) E [FN ] = E N i=1 p(Xi ) N f (Xi ) 1 X = E N i=1 p(Xi ) N Z 1 X b f (x) = p(x) dx N i=1 a p(x) N Z 1 X b = f (x) dx N i=1 a Z b f (x) dx = a Although I am not going to√show it here, the error in the Monte Carlo estimator decreases at a rate of O( n). For a proof of this, see (Eric Veach 1997, 39). This means to decrease the error to one half of what it was, we need to take four times as many samples, and so on. 24 4.3 Monte Carlo path tracing I have now covered everything required to outline an algorithm for Monte Carlo path tracing. We will reuse the code for initialising the camera and focal plane as described in chapter 2. We do, however, need to modify the Irradiance function and the code for spawning initial rays. The code for the former now looks like: Irradiance( start, dir, depth ) { // don't loop forever if depth > 5 { return black } // hit is an object containing information // about the intersection let hit = Intersect( start, dir ) if hit == null { return black } // specular reflection is now handled by the BRDF let ( outgoing, reflectance, pdf ) = SampleBxDF( is.bxdf, is.normal, -dir ) let emittance = black if is.light != null { emittance = Emittance( is.light, is.normal, -dir ) } } // no need for explicit light sampling return emittance + Irradiance( is.pos + outgoing * EPSILON, outgoing ) * reflectance * normal.dot( outgoing ) / pdf The most interesting code lies in SampleBxDF. This function takes the surface BxDF at the intersection point, the surface normal and the direction of outgoing light – towards the point we are computing the irradiance for. It then returns a tuple, an outgoing ray to consider next, the evaluation of the BxDF with that ray, and the probability density for choosing that outgoing ray. Observe that the result of Irradiance looks similar to the Monte Carlo estimator and the rendering equation. 25 Also note that Irradiance can be transformed into tail recursive code, and therefore into iterative code. This makes it easily translatable to GPU code. We require a maximum recursion depth and a corresponding depth parameter, otherwise the Irradiance function would loop forever. Using a hard depth limit like this does actually introduce error to renderer and I discuss how to mitigate it in section 6.2.1. I also mentioned that we need to change the code that spawns the initial rays from the camera. We now need to keep track of the sum of evaluations of the Monte Carlo estimator, and the number of samples made for each pixel. In pseudocode: let N = 0 while true { N += 1 for x in [0, width) { for y in [0, height) { let p = CentreOfPixel( x, y ) let ray = Normalise( p - o ) } } } totals[ x, y ] += Irradiance( o, ray ) pixels[ x, y ] = totals[ x, y ] / N 4.3.1 Reflectance models Here I will present the implementation of BRDFs for perfectly diffuse, or Lambertian4 , material and perfectly specular material. 4.3.1.1 Lambertian reflection The Lambertian reflectance model says that all incident light is scattered evenly. Another way of putting this, is that the BRDF for a Lambertian material is constant. The naive implementation would be to return a uniform random direction from the unit hemisphere5 around the surface normal. If, however, we recall the Lambert’s cosine law and scaling by a cosine term in the rendering equation, we can do better. 4 http://en.wikipedia.org/wiki/Lambertian_reflectance 5 The set of directions {(x, y, z) | x2 + y 2 + z 2 = 1, z ≥ 0}. 26 Light incident at shallow angles will add very little to the total irradiance. We can’t choose to ignore these contributions entirely as that would introduce error into the resulting image. Instead, we can only consider such directions less often, and the division by the PDF in the Monte Carlo estimator will ensure that the expected value is unchanged. Specifically, we can sample the unit hemisphere about the surface normal, weighted by the cosine of the outgoing ray and the surface normal. Instead of directly sampling the unit hemisphere, we will use an algorithm called Malley’s method (Matt Pharr 2010, 668). First, we uniformly sample the unit disk6 to obtain a point p. This is easiest to do by sampling the disk in polar coordinates7 , then converting to cartesian. Given a pair of uniform random samples a and b from the interval [0, 1), we can compute p by doing: let r = √ a, θ = 2πb p = (r cos θ, r sin θ) Next, we project p upwards until it reaches the surface of the hemisphere (Figure 4.1). We call this second point p0 , which can be computed as follows: p p0 = px , py , 1 − x2 + y 2 p' p Figure 4.1: Malley’s method. Since the unit hemisphere is aligned with the z axis, we need to transform the hemisphere and our sample to be aligned with the surface normal. The construction of the rotation matrix to do this is the same as in Section 2.3.1.3, so I won’t include the derivation again here. 6 The set of points {(x, y) | x2 + y 2 ≤ 1}. 7 http://en.wikipedia.org/wiki/Polar_coordinate_system 27 We can take the transformed p0 to be our ωi , so the last step required is to evaluate the PDF, p(ωi ). I won’t include the derivation here, but recalling again from section 2.3.1.3 that k is the unit vector in the direction of the z axis, and denoting the surface normal as n, we can evaluate it as follows: 1 (ωi · n) π 1 p(ωi ) = (p0 · k) π p0 p(ωi ) = z π p(ωi ) = 4.3.1.2 Specular reflection The BRDF for perfect specular reflection is considerably simpler. We take ωi to be the reflection – see Reflect in section 2.2 – about the surface normal, and the probability density to be one. One thing to note is that we don’t want Lambert’s cosine law to apply here, as we don’t want reflective surfaces to get dimmer if we look at them from shallow angles. In order to keep the Irradiance function simple, we divide the reflectance returned by the specular BRDF by cos θ. 28 Chapter 5 Path Tracing in Rust As mentioned previously, I decided to switch to CPU code for my path tracing implementation. The language I chose to use for this was Rust1 . In this chapter I will briefly describe why I think Rust was a good choice, and then discuss details of my implementation. From here on I will refer to objects in a scene as entities, to prevent confusion between them and objects in the OOP sense. 5.1 A brief overview of Rust Rust is a systems language being developed by Mozilla2 . It is currently still considered unstable, with backwards incompatible changes happening fairly often. This isn’t such a big problem as recent versions have been reliable enough that I don’t feel the need to use nightly builds. Rust offers pattern matching and a type system that sits between C and Haskell, which makes it pleasant to program in. It also has a CSP-like3 concurrency model built in to the language. Having shared memory is still possible, but you have to sacrifice some of the compiler’s safety guarantees in code that uses it. It is worth mentioning that these abstractions can also be mapped onto features available in GPU code. 1 http://www.rust-lang.org/ 2 http://www.mozilla.org/ 3 Rust favours channels and message passing over shared memory. 29 5.2 Implementing a path tracer in Rust I will begin with a table describing the file system layout of my project. I have made a list of notable files and folders and their purposes in (Table 5.1). The design is largely inspired by the renderer described in (Matt Pharr 2010). Table 5.1: An overview of the modular separation of my renderer. Path Purpose lights Defines Light trait. Provides implementation of area lights. Defines Material and BxDF traits. Provides implementations of matte, specular and checkerboard materials. Also provides Lambertian and specular BRDFs. Implementation of vectors, rotation transformations, RGB colours, degrees/radians abstractions and generic interpolation. Sampling routines for the triangular distribution (section 5.2.2), unit disk and unit hemisphere. Defines Shape trait. Provides sphere and plane implementations. Defines World trait. Worlds are objects containing a set of entities and expose methods for intersecting rays with the scene. Provides implementations of a naive world and the union of two worlds. Combines area lights, materials and shapes to represent real objects in the scene. Spawns the rendering threads and drawing loop. Contains the Monte Carlo path tracing implementation. materials maths maths/sampling shapes worlds entity.rs main.rs 5.2.1 Materials In addition to implementing matte and specular materials, I thought it would be interesting to add a checkerboard material. You can see an example of what it looks like on the cover of this report. I wanted the checkerboard to be more flexible than just allowing two different colours, so I made it possible to use any pair of materials for the checks. This is what made it possible to have alternating matte and mirrored checks like on the cover. The interesting section of code for the checkerboard material looks like: 30 impl Material for Checkerboard { fn get_bxdf( &self, u : f64, v : f64 ) -> ~BxDF { // ten checks for every unit distance let u10 = ( u * 10.0 ).floor() as i64; let v10 = ( v * 10.0 ).floor() as i64; // compute UV coordinates within the check let uc = u * 10.0 - u10 as f64; let vc = v * 10.0 - v10 as f64; } } // odd and if ( u10 + return } else { return } even parity use separate materials v10 ) % 2 == 0 { self.mat1.get_bxdf( uc, vc ); self.mat2.get_bxdf( uc, vc ); I also thought my implementation of the Lambertian BRDF was interesting and it shows off Rust’s type inference and support for tuples. impl BxDF for Lambertian { fn sample_f( &self, normal : Vec3, _ : Vec3 ) -> ( Vec3, RGB, f64 ) { let rot = Rotation::between( Vec3::k(), normal ); let sample = hemisphere::sample_cos(); let out = rot.apply( sample ); let pdf = sample.z * Float::frac_1_pi(); } } return ( out, self.reflectance, pdf ); Rotation::between gives the rotation matrix that represents the rotation from one normal vector to another (section 2.3.1.3). sample is a random direction in the unit hemisphere aligned with the z axis, chosen by Malley’s method (section 4.3.1.1). pdf holds the probability density for the chosen direction. 5.2.2 Sampling The only thing I haven’t already explained in the sampling directory is the triangular distribution4 sampling code. This module provides a function that 4 http://en.wikipedia.org/wiki/Triangular_distribution 31 returns a value in the interval (−1, 1) distributed by a symmetric triangular PDF with a mean of zero. This module is used to generate random pixel offsets for anti-aliasing and in my code I have called it tent. 5.2.3 Shapes Objects that implement the Shape trait are required to provide three methods, all corresponding to operations defined in Section 2.3. The interface is defined as: pub trait Shape { // return smallest positive t, or None if no intersection fn intersection( &self, start : Vec3, dir : Vec3 ) -> Option< f64 >; // return the surface normal at point fn normal( &self, point : Vec3 ) -> Vec3; // return the UV coordinates of a point fn uv( &self, point : Vec3 ) -> ( f64, f64 ); } Rust does not have a null value. Instead, it uses the Option type to annotate a value with the fact that it might not exist. This is just like Maybe from Haskell. The intersection method can therefore either return None to represent no intersection, or Some( t ) where t corresponds to the t in the ray equation. 5.2.4 Worlds A World is a representation of a scene, containing information about the entities and their emissive properties within it. It also exposes a method for generating intersections with entities in the scene, which returns an object containing information about the intersection. Specifically, the Intersection object contains the value of t for the intersection, the entity the ray collided with, and evaluates the ray equation for us to get the intersection point. I wrote two implementations of World in my renderer. One called SimpleWorld which naively intersects rays with every entity in the scene, and UnionWorld which takes a pair of worlds and intersects rays with both of them. The latter is more interesting, so here is what the UnionWorld implementation looks like: impl World for UnionWorld { fn intersection< 'a >( &'a self, start : Vec3, dir : Vec3 ) 32 -> Option< Intersection< 'a > > { let oi1 = self.w1.intersection( start, dir ); let oi2 = self.w2.intersection( start, dir ); match ( oi1, oi2 ) { ( Some( i1 ), Some( i2 ) ) => { if i1.t < i2.t { return Some( i1 ); } } } } } return Some( i2 ); ( x, None ) => x, ( None, x ) => x In Rust, 'a refers to a lifetime variable 5 . The Rust compiler uses lifetime variables to prove that references which are in scope will never point to recycled areas of memory. I’m using them here because Intersection contains a pointer to the entity it intersected with, and Rust requires the annotations so it can infer that the entity will never be deallocated before all Intersection objects become unreachable. 5.2.5 Piecing everything together All of the above comes together in main.rs. I will start by presenting the code for spawning the rendering threads: for n in range( 0, THREADS ) { // (omitted) increment Arcs spawn( proc() { // (omitted) pull data out of Arcs sampler( eye, pixels[ n ], centres[ n ], world, up_pixel, left_pixel ); } ); } The above code uses several variables which are defined elsewhere. They are as follows: 5 http://static.rust-lang.org/doc/master/guide-lifetimes.html 33 eye: The camera’s focal point. pixels: Maps threads to the pixel indices they should be drawing. centres: For each pixel in pixels, the corresponding element in centres contains the centre of that pixel. world: An instance of World. up_pixel, left_pixel: These are vectors holding the distance between vertically and horizontally adjacent pixels on the focal plane. We use them for anti-aliasing. Since I am sharing data between threads, the Rust compiler requires that each thread increments an atomic reference counter (Arc6 ) at the start, and decrements it when the thread terminates. This is so Rust knows when it can free those objects without resorting to a garbage collector. The purpose of the sampler function is to take irradiance samples for each pixel it controls and update the output image data accordingly. fn sampler( eye : Vec3, pixels : &[ uint ], centres : &[ Vec3 ], world : &World, up_pixel : Vec3, left_pixel : Vec3 ) { let mut samples = 0; loop { samples += 1; for i in range( 0, pixels.len() ) { let dx = left_pixel * tent::sample(); let dy = up_pixel * tent::sample(); let ray = ( centres[ i ] + dx + dy - eye ).normalised(); let color = irradiance( world, eye, ray, 0 ); } } } unsafe { let p = pixels[ i ]; image[ p ] = ( samples, image[ p ].val1() + color ); } if samples % 10 == 0 { println!( "{}", samples ); } 6 http://static.rust-lang.org/doc/0.10/sync/struct.Arc.html 34 Recall that tent::sample() is sampling a triangular distribution. It is also worth mentioning that Rust implicitly creates separate random number generators for each thread7 , so generating samples is thread-safe and isn’t slowed down by any synchronisation code. In the above code, image is a shared array of tuples holding the number of samples taken for each pixel and the sum of said samples. Rust mandates that writing to shared variables is placed inside an unsafe block, which relaxes restrictions the compiler places on your code. Inside this block I also use the val1() method on pairs, which extracts the first value. All that remains is the implementation of irradiance: fn irradiance( world : &World, start : Vec3, dir : Vec3, depth : uint ) -> RGB { if depth > 5 { return RGB::black(); } let ois = world.intersection( start, dir ); return ois.map_or( RGB::black(), | is | { let normal = is.other.shape.normal( is.pos ); let ( u, v ) = is.other.shape.uv( is.pos ); let bxdf = is.other.material.get_bxdf( u, v ); let ( outgoing, reflectance, pdf ) = bxdf.sample_f( normal, -dir ); let emittance = /* (omitted) */ ; let throughput = abs( normal.dot( outgoing ) ) / pdf; } return emittance + ( reflectance * irradiance( world, is.pos, outgoing, depth + 1 ) ).scale( throughput ); } ); In Rust, map_or is a method on Option values that behaves like maybe in Haskell. It takes a default value and a function, where the default is returned immediately if the Option value is None. Otherwise, the function is applied to the value inside the Just and the result is returned. I omitted the right hand side of emittance because nested map_ors are noisy to read and it’s unnecessarily complex for a code fragment. 7 http://static.rust-lang.org/doc/0.10/rand/fn.random.html 35 You should still include offsetting outgoing rays by a small epsilon term. In my code I moved it to be inside the intersection method in World. 36 Chapter 6 Conclusion In this chapter I will evaluate the performance of my renderer and show the effects of adjusting various parameters. I will also discuss briefly how I feel this project is related to my course. Finally, I will talk about the ideas I would like to have followed through on if I had more time. 6.1 Performance and evaluation Even though it is easy to predict how my renderer will perform, it is still helpful to provide empirical data so we can analyse it more formally. 6.1.1 Linearity with number of threads Since there are no dependencies between threads, we would expect the time to perform a fixed number of samples per pixel to be inversely proportional to the number of running threads. As my desktop has four CPU threads, I have made measurements for one, two, four and eight threads with 50 samples per pixel. Beyond four threads, we would expect to see performance plateau, and then drop off as context switching starts to dominate. 37 Table 6.1: Measuring how varying the number of threads affects the time required to take 50 samples per pixel. We can see the time taken increases more or less linearly until I run out of hardware threads. 6.1.2 Threads Time Time × threads 1 2 4 8 73.0s 37.4s 20.2s 20.0s 73.0 74.8 80.8 160 Noise reduction with number of samples When you watch the renderer running, there is a very obvious correlation between how many samples the renderer takes and the output image quality. Images with a low number of samples per pixel will have high variance, which manifests itself as noise. As a demonstration of this, I have taken screenshots of the renderer after 10, 100 and 500 samples per pixel (Figures 6.1, 6.2 and 6.3 respectively). Additionally, the image on the cover was taken after 25,000 samples per pixel. The signature graininess of path tracing renderers is very visible in the first two images. Figure 6.1: A screenshot taken after 10 samples per pixel. We can barely make out what the scene is supposed to be. Figure 6.2: A screenshot taken after 100 samples per pixel. We have a much better idea of what the scene looks like than with 10 samples. Figure 6.3: A screenshot taken after 500 samples per pixel. The graininess is still noticeable, but it is clearly an improvement over 100 samples. 6.1.3 Adjusting the recursion limit If we decrease the maximum recursion depth of my renderer, we can expect to see a performance increase in terms of number of samples made, however, the image quality will decline as paths struggle to find a light source. We will also 38 introduce error into specular reflections between objects as a result of paths being truncated prematurely. I have saved the output image after 200 samples per pixel and with depth limits of two, three and five (Figures 6.4, 6.5 and 6.6). I have also measured the time it took to finish the renders. We would expect the time taken to be proportional to the depth limit. Table 6.2: The time is not quite increasingly linearly. I expect it is slightly better than linear because anti-aliasing makes spawning the initial rays more expensive. Depth Time Time ÷ depth 2 3 5 46s 59s 85s 23 20 17 Figure 6.4: Taken with a depth limit of five. We can see the spheres reflecting each other many times. Figure 6.5: Depth limit three. The image is darker as paths struggle to find a light source and we have lost some of the reflections. Figure 6.6: Depth limit two. The image is darker still and the spheres’ reflections of each other and in the floor are now just black circles. We have also almost entirely lost the colour bleeding on the back wall. 6.1.4 Tweaking the light parameters Adjusting the size of the light will affect the convergence rate of the path tracing algorithm as it will change the likelihood that an individual path reaches the light source. If we increase the size of the light and decrease its emittance so its overall power1 does not change, we can expect the image to be more or less the same but converge more quickly. 1 It’s surprisingly difficult to find an exact formula for power, but it is proportional to the surface area of the light. 39 For Figure 6.7, I have rendered a scene where the light’s radius has been doubled and its emittance has been quartered. This render has 500 samples per pixel so it is comparable with Figure 6.3. Figure 6.7: The same number of samples as Figure 6.3, but with a bigger light. Notice how the image is less grainy, but otherwise very similar. It can be seen from the above analysis in this section that my renderer performs as expected when we tweak various parameters, which makes me feel confident that my implementation is correct. 6.2 Future work My renderer is neither perfect nor complete. If I had more time to work on it, I would like to have implemented the following, roughly in order of priority. 6.2.1 Russian Roulette path termination Recall how I mentioned in section 4.3 that the hard recursion limit was introducing error into the renderer. We can avoid this error by using a technique known as Russian roulette. Before recursing, we randomly truncate paths with probability based on their throughput, and scale the throughput of surviving paths accordingly. While this does increase the variance of the estimator, it allows us to stop spending time on paths that don’t contribute much. Therefore, in theory we can make more samples in a fixed period of time. 6.2.2 Materials Only having perfectly diffuse and specular surfaces is rather limiting. My renderer would be much more flexible if it supported more reflection models and could load images from disk to use as textures. It would also be useful to have methods for combining surface properties to form reflection models that lie between what has been explicitly implemented. 6.2.3 Acceleration structures The naive SimpleWorld implementation doesn’t scale to large scenes that contain many entities and lights. As mentioned in section 2.2, we can improve on this by splitting the scene up. 40 I would have liked to implemented a decoder for the Quake 3 BSP2 format. Quake 3 levels are distributed with a pre-built BSP tree as part of the file format, and implementing ray intersections against a BSP tree is simple (Christer Ericson 2004, 376). 6.2.4 Do away with pixels The current implementation of anti-aliasing requires a pair of sqrts for every primary ray. Additionally, if a ray is offset to lie at the half-way point between two pixels, it should be able to contribute to both equally and not just to the pixel it was cast for. Both of these problems are solvable by sampling the focal plane continuously and combining all nearby samples to compute final pixel colours. This would improve the rate of convergence of my renderer, and would also make it easy to decouple the reconstruction filter from the ray tracing implementation. 6.2.5 Better integration methods There has been a lot of research on speeding up the path tracing algorithm without introducing error. These methods include: Bidirectional path tracing: In addition to shooting rays from the camera, we also shoot rays originating from the light sources in the scene. We then join the vertices of both paths to form many more that definitely contribute light to the sample. See (Eric Veach 1997, 297). Metropolis Light Transport3 : This is the application of the MetropolisHastings algorithm4 to path tracing. Roughly, it works by applying small adjustments to promising paths in the hope that it leads to more paths with high throughput. 6.3 Final thoughts I feel like I learned a lot whilst working on my project, in both computer graphics algorithms and new languages. I believe that my renderer is a good first step towards writing a more complete renderer. In addition, the design decisions I made will make it easy to extend and study in the future. I also feel I was successful in leaving open the option of converting my renderer back to GPU code. 2 http://www.mralligator.com/q3/ 4 http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm 41 Rust was an interesting language to learn because of its non-standard approach to memory management and powerful type system. I hope to find more projects that I can use it in. My project draws material from both the Computer Graphics and Computer Architecture courses, and builds on the things I learned in those courses. In the Computer Graphics course, we learned about the ray tracing algorithm and reflectance models, both of which I have applied and expanded on in my project. The Computer Architecture course covers GPGPU programming, which sparked my original idea for this project. In this report, with the guidance of Dr Joe Pitt-Francis, I have described algorithms for producing photorealistic images and provided an example implementation of such a renderer. In addition, I have laid the groundwork for myself to continue expanding my knowledge in this field. 42 References Anton Kaplanyan. 2009. “Light Propagation Volumes in CryEngine 3.” http://www.crytek.com/download/Light_Propagation_Volumes.pdf. Christer Ericson. 2004. “Real-Time Collision Detection.” Morgan Kaufmann. Dietger van Antwerpen. 2010. “Unbiased Physically Based Rendering on the GPU.” http://repository.tudelft.nl/view/ir/uuid% 3A4a5be464-dc52-4bd0-9ede-faefdaff8be6/. Eric Veach. 1997. “Robust Monte Carlo Methods For Light Transport Simulation.” http://window.stanford.edu/papers/veach_thesis/thesis.pdf. Matt Pharr, Greg Humphreys. 2010. “Physically Based Rendering; Second Edition.” Morgan Kaufmann. 43