Real Time Global Illumination Solutions to the Radiosity Algorithm using Hybrid CPU/GPU Nodes

Team Members:	Oluwapelumi Adenikinju¹, Julian Gilyard², Joshua Massey¹, and Thomas Stitt³
Graduate Research Assistant:	Jonathan Graf⁴ Xuan Huang⁴, and Samuel Khuvis⁴
Faculty Mentor:	Matthias K. Gobbert⁴
Client:	Yu Wang¹ and Marc Olano¹

¹Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County,
²Department of Computer Science, Wake Forest University,
³Department of Computer Science and Engineering, Pennsylvania State University
⁴Department of Mathematics and Statistics, University of Maryland, Baltimore County,

team5
Team 5, from left to right: Joshua Massey, Matthias K. Gobbert, Oluwapelumi Adenikinju, Julian Gilyard, Thomas Stitt, Marc Olano.

About the Team

Our team consists of four undergraduates, Oluwapelumi Adenikinju, Julian Gilyard, Joshua Massey and Thomas Stitt. Oluwapelumi is rising senior in computer engineering at UMBC. Julian is a rising junior in computer science at Wake Forest University. Joshua is a rising sophomore in computer engineering at UMBC. Thomas is a senior in mathematics and computer at Pennsylvania State University.

Problem

In computer graphics, an integral portion of the rendering timeline is solving for the light distribution of a scene. When realism is important, a solution involving global illumination – allowing light rays to bounce before hitting the viewer – is desired. One method that awards particularly good results for diffuse surfaces is the radiosity algorithm, despite the fact that this method is computationally expensive – using it for near-real to real time rendering is not generally practical. Starting with the existing computational radiosity solver rrv, we determine how runtimes can be reduced through a combination of multi-core CPU to massively-parallel GPU architectures available in the cluster maya of the UMBC High Performance Computing Facility .

Computational Methods

Jacobi Method: The original code in rrv solves for the solution of the radiosity equation using a fixed number of iterations in the Jacobi method. This can be physically thought of as one light bounce per iteration, as illustrated visually in the flow chart in Figure 1.

Flowchart
Figure 1: Flow chart of iterations in the Jacobi method.
To provide certainty that the method has actually converged and to save iterations potentially, we reimplement the Jacobi method to compute only until an equilibrium bouncing of light is reached as measured by a tolerance of 10^-6 on the relative residual.

BiCG-STAB Method: The Jacobi method can be slow to converge. The BiCG-STAB method is an alternative iterative method, whose convergence speed increases with iterations. It requires twice as much work per iteration, hence might take less time for complex problems that require a large number of iterations with the Jacobi method.

Distributed Computing

CUDA: CUDA is NVIDIA’s library for leveraging the massively parallel architecture of GPUs (graphics processing units). GPUs are designed for SIMD (single instruction multiple data) applications like those used in solving the radiosity equation. CUDA has an efficient package (cuBLAS) for solving linear system, but since the system is composed of vector elements with multiple values (arrays of structures), these methods cannot be used. We implement a dot-product and matrix-vector product for vectors of this type using binary tree reduction and a simple axpby operation for vector scaling and addition.

OpenMP: OpenMP (Open Multi-Processing) is an application programming interface that provides simplified and cross-platform shared-memory parallelization. Memory is considered “shared” when each worker (thread) shares it with all other workers, unless marked worker-private – which is beneficial here since the system matrix of the radiosity equation is large. With OpenMP, portions of serial code are flagged to be run in parallel by work distribution among available threads. We utilize parallelization much like with CUDA to speed up the matrix-vector, dot, and axpby functions.

Results

Runtime results were obtained for four sample scenes of varied patch count and complexity. We see in Table 1 that the scenes used were not complex enough to make apparent the advantages of the reduced number of iterations of BiCG-STAB method, due to its increased cost per iteration.

		Jacobi	Jacobi	BiCG-STSB	BiCG-STAB
Scene ID	Patch count	iter	runtime	iter	runtime
1	1312	8	0.009	3	0.010
2	3360	28	0.045	16	0.058
3	9680	36	0.410	17	0.435
4	17128	32	0.993	17	1.157

Table 1Parallelization on the other hand showed marked improvement with observed run times in units of seconds in Table 2 dropping over an order of magnitude.

Scene ID	Original	Serial	CUDA	OpenMP
1	0.028	0.031	0.006	0.009
2	0.857	0.677	0.115	0.045
3	10.072	6.973	1.209	0.410
4	27.855	19.394	3.415	0.993

Table 2

Conclusions

Both computational methods and parallelization were explored with the aim of near-real to real time global illumination solutions to the radiosity algorithm. Given the scenes we tested, OpenMP and CUDA both show substantial runtime improvements while the change from the Jacobi method to the BiCG-STAB method actually resulted in increased runtime due to the method’s complexity. It appears that global illumination problems are not in general best suited for mathematical reformulations, but taking advantage of work distribution seems favorable, no matter the scene complexity.

Links

Oluwapelumi Adenikinju, Julian Gilyard, Joshua Massey, Thomas Stitt, Jonathan Graf, Xuan Huang, Samuel Khuvis, Matthias K. Gobbert, Yu Wang, and Marc Olano. Real Time Global Illumination Solutions to the Radiosity Algorithm using Hybrid CPU/GPU Nodes. Technical Report HPCF-2014-15, UMBC High Performance Computing Facility, University of Maryland, Baltimore County, 2014. Reprint in HPCF publications list

Poster presented at the Summer Undergraduate Research Fest (SURF)

Click here to view Team 1’s project
Click here to view Team 2’s project
Click here to view Team 3’s project
Click here to view Team 4’s project

REU Site: Interdisciplinary Program in High Performance Computing

REU Site: Interdisciplinary Program in High Performance Computing

Real Time Global Illumination Solutions to the Radiosity Algorithm using Hybrid CPU/GPU Nodes

About the Team

Problem

Computational Methods

Distributed Computing

Results

Conclusions

REU Site: Interdisciplinary Program in High Performance Computing

About the Team

Problem

Computational Methods

Distributed Computing

Results

Conclusions

Subscribe to UMBC Weekly Top Stories

I am interested in: