Multithreading Game Engines for Multicore Machines

With all the latest developments in the game console systems and the endorsement of multicore processors, I really got into the path of investigating what will it take to make an existing game engine that was written to run on one CPU, to run on a multicore-multicpu system.

I must confess that I’m not happy with the direction we take. Not that I have something else to propose, but multi-threading a beast, like a game engine, was always something I wanted to avoid! But we must face the truth… we can’t get far with one CPU… just imagine a system with 20+ cores and software that will be able to harvest the processing power… it would be awesome!

So before we start I really think that we should escape the narrow-mindness that I witness on developers lately. At this breaking point, that we make the cut, we should make it deep.

Most developers start out to redesign the engines to run on 2 cores. Others at 3.. etc… The basic principle for the design should be that we should be able to run on any number of cores, and be able to achieve almost linear performance boost on the number of cores. So designs like: There is one rendering thread, one game simulation thread and one physics simulation thread…, are out of the question. I don’t think we should even bother with constant multipliers… I better wait for CPU to double or triple in speed!

What we need is to alter the algorithms that are involved in a game engine to and make them parallel. And these algorithms should have good CPU complexity. What I mean by that… Consider that we need to calculate the maximum element of an array of integers. This is a task of O(n) time complexity. Suppose now that you have n^2 CPUs… this would allow us to find the maximum in constant O(1) time. Even if this algorithm is not so efficient regarding the number of CPUs, it scales according to the number of CPUs.. this is what we need…

The basic design I’m starting to feel comfortable with concerns a pool of work threads, that are assigned as needed. Then like with ordinal optimizations we find the bottlenecks and parallelize them. For example it is not so important that the physics engine runs on a different thread, put that is written in a way that can utilize any number of available cores. This can be archived by processing each body island in a different thread. The same goes for collision detection too. In general what we must do first is convert any heavy for-loop to a parallel-for-loop. Whenever that is possible. In a stencil shadow volume engine an other possible for-loop is the one that calculates the silhouettes. This can be done in parallel for every object. No actual change will be made to the silhouette calculation algorithm. The only thing that changes is the for-loop on the outside. You will be surprised how many things have no dependencies and no critical section and can be parallelized efficiently this way.

The hard part to parallelize would be the game code. The code that the entities run. This part has many critical sections since many entities affect others. How hard it would be to parallelize this depends on the engine and how it was designed. For Sylphis this will be easier since the game core is already thread aware (in a sense)….

  • stefan stoeff

    There is an law, called Amidahls Law, that states: Let P be the parallel part of your program and S the serial and P+S=1 then the speedup that you can obtain with many computing units is

    speedup=1 / ( P/N +S)

    This way one can know how the code will benefit from multicore, multiprocessors etc.

  • Mincetro

    I can’t see any benefit for the normal, everyday game develpoer from dual-Core CPU’s. It just adds more work trying to “fill unnessesary space” i think the term is…

  • You should take a look at what tim sweeney had to say about what features are needed in a future (game-) programming language. The ability to easily scale in a parallel environment is one of the bigger points he talks about.

  • Carl

    If you start running all of your algorithms asynchronously then it is going to be really hard to break up when you want to receive your results. For example you have a single game loop and in that you have graphics functions, etc. When you call your math functions you need to then start your process and then receive the results. See below of what I think this would look like:

    startcalculateplayersposition startcheckforcollision

    more calculations

    position = endcalculateplayersposition collision = endcheckforcollision

    The other thing to think about is how well this work due to the fact that the end calls need to block until they finish executing so if you have a long running calculation then you may still only get 1 thread performance.

    I would be interested in what you were thinking about this.

    Thanks, Carl.

  • Bucko

    There are lots of game engine tasks that beg for paralellization: Pathfinding: Make sure the pathfinder copies what it needs for pathfinding at startup. I think this is the perfect example as pathfinding can take several frames to finish. Physics simulation: As long as it has a copy of the static collision data somehow it can run alone with little need for talking with the rest of the game. Cloth simulation Higher level ai etc

    • I’m just wondering cause I was rolnlig the bakugan around the drawer and it opened up (which means it touched something magnet) and it opened by a cd. But it could have been the drawer becuase that is also magnetic. I heard when you hold a magnet over a video tape, it will start to erase. I’m concered, is it the same with cds? PLEASE ANSWER MY QUESTION!!!

  • John

    >pool of work threads, that are assigned as needed

    Have used this design concept in the past for large scale VPN server design on multi-cpu and it scales very well. Some algorithms will need design changes, the author makes a good point of this.

    Our own 3D engine has been undergoing multi-core design changes for months and the improvements far out weigh any complexity changes in design.

    Faster is always better, there so much more that is untapped.

  • Great post Harry!

    >pool of work threads, that are assigned as needed

    I too agree with this approach. I think those who embrace the fact that this is the way the near future of game development is headed, will be better off than those who don’t. As to whether or not we are headed permanently in a direction of multi-core processors and a parallel processing environment remains to be seen. While there is certainly a lot of power to be had with setups like that, there is no arguing that it makes development a bit harder. Hopefully there will someday be a spead breakthrough again and processor speeds can continue to grow inline with what Moore’s law calls for. I get the feeling that the multi-core design became mainstream so quickly due to the lack of CPU speed increase over the last few years. However, I recently read something about IBM (or maybe somebody else) testing 30-nanometer chips and it sounded promising.

  • Bucko

    The picture gets more complicated when you deal with the PS3. It has one PowerPC core (with ‘hyperthreading’ ie two hardware threads) and 6 SPU with very little memory, memory management handled by you the programmer and its own compiler and binaries that differ from the PowerPC one. A mian game thread on the PowerPC and worker threads on the SPUs seem the way to go but that would underutilize the PowerPC as its crappy branch predictor and in order execution would have it stalling most of the time. Having workthreads that can run on PPC 2 as well as the SPUs would be pretty hard to do as they are so different.

  • Asfand Yar Qazi

    Lenient evaluation

    I was looking at OCaml – I’ve heard good things about it. How does it compare to Tim Sweeny’s ‘next mainstream programming language’?


  • Have you looked at OpenMP? It seems well suited to parallelize most of the examples you mention in an easy, readable, and non-intrusive fashion.

  • Yes I’m aware of OpenMP! Visual Studio 2005 supports it also!

    Oblivion uses it also for running on multicore systems (like the XBOX360 and multicpu PCs)…

  • Pingback: Mark Powell’s Dev Journal » jME: Multithreaded Madness()

  • Incebrot

    Hi all! Best site in web! Bye