Feb 16, 2006

Multithreading Game Engines for Multicore Machines

With all the latest developments in the game console systems and the endorsement of multicore processors, I really got into the path of investigating what will it take to make an existing game engine that was written to run on one CPU, to run on a multicore-multicpu system. I must confess that I’m not happy with the direction we take. Not that I have something else to propose, but multi-threading a beast, like a game engine, was always something I wanted to avoid! But we must face the truth… we can’t get far with one CPU… just imagine a system with 20+ cores and software that will be able to harvest the processing power… it would be awesome! So before we start I really think that we should escape the narrow-mindness that I witness on developers lately. At this breaking point, that we make the cut, we should make it deep. Most developers start out to redesign the engines to run on 2 cores. Others at 3.. etc… The basic principle for the design should be that we should be able to run on any number of cores, and be able to achieve almost linear performance boost on the number of cores. So designs like: There is one rendering thread, one game simulation thread and one physics simulation thread…, are out of the question. I don’t think we should even bother with constant multipliers… I better wait for CPU to double or triple in speed! What we need is to alter the algorithms that are involved in a game engine to and make them parallel. And these algorithms should have good CPU complexity. What I mean by that… Consider that we need to calculate the maximum element of an array of integers. This is a task of O(n) time complexity. Suppose now that you have n^2 CPUs… this would allow us to find the maximum in constant O(1) time. Even if this algorithm is not so efficient regarding the number of CPUs, it scales according to the number of CPUs.. this is what we need… The basic design I’m starting to feel comfortable with concerns a pool of work threads, that are assigned as needed. Then like with ordinal optimizations we find the bottlenecks and parallelize them. For example it is not so important that the physics engine runs on a different thread, put that is written in a way that can utilize any number of available cores. This can be archived by processing each body island in a different thread. The same goes for collision detection too. In general what we must do first is convert any heavy for-loop to a parallel-for-loop. Whenever that is possible. In a stencil shadow volume engine an other possible for-loop is the one that calculates the silhouettes. This can be done in parallel for every object. No actual change will be made to the silhouette calculation algorithm. The only thing that changes is the for-loop on the outside. You will be surprised how many things have no dependencies and no critical section and can be parallelized efficiently this way. The hard part to parallelize would be the game code. The code that the entities run. This part has many critical sections since many entities affect others. How hard it would be to parallelize this depends on the engine and how it was designed. For Sylphis this will be easier since the game core is already thread aware (in a sense)…