Thoughts Serializer

From Python to Lua

(This blog was originaly posted at #AltDevBlogADay)

All game developers, sooner or later, learn to appreciate scripting languages. That magical thing that allows for letting others do your job, better scaling of the team, strengthening the game code/engine separation, sandboxing, faster prototyping of ideas, fault isolation, easy parametrization, etc. Every game has to be somehow data driven to be manageable, and stopping at simple configuration files, with many different custom parsers, without going the extra mile of adding a full scripting language, is 90% of the times a bad design choice.

Today the developer can choose from a large variety of scripting languages, or even go crazy and implement one on his own. It happens that the most favored language for game developers is Lua. Its easy to understand why Lua is the favorite but other options are used as well. For example Python and the lately upcoming force of  Javascript.

Here I would like to share some of my experience of moving a game engine from Python to Lua.

Transition

My first experience with embedding a scripting language was with Python. More specifically with Stackless Python. When I started implementing my game engine I chose Stackless Python mainly because it was my language of choice for all development when not doing it in C++, and due to the support for tasklets. However the last year I got into porting the engine into the iOS platform and actually “pulling�? it down in graphical features in order for it to run efficiently on the iPhone. One of the major changes was the transition from Python to Lua.

Better Memory Management

Python is a very powerful language will many bells and whistles and has served me well on the “full�? version of Sylphis3D. However on the mobile platform the language was making things hard on performance. One of the major problems I was facing was the memory management. For those that don’t know, Python uses reference counting for memory management, coupled with a garbage collector (GC). What reference counting means, is that Python keeps track of the number of references an object has, and when that goes down to zero the object is no longer needed. That works well until you have two objects referencing each other, or forming a cycle of references. The reference count will never go to zero for those objects even if the cycle is not connected to the main object reference graph. So this is work for the GC who finds these lost clusters of objects by waking all reachable objects.

The problem with games is that we have to be fast. Fast and predictable. If you want the game to run at 60 frames per second, it means that you must do all the work in under 16.7ms. Whenever you miss that deadline -even by little- you have to wait for the next screen refresh. The worst thing that can happen is to cross that time limit sporadically and in a random pattern. This will result in motion shuttering, and it is most likely to give the user a better experience by going down to 30 FPS and sticking to it. So how does Python’s memory management system fits into the above picture? – On mobile… not so well…

The reference counting is completely predictable and consistent. An object will be released exactly the moment it is no longer needed. We have predictable and immediate release of resources. This is ideal for game development as long as you don’t create cycles. You have to be careful not to create cycles and if you do, make sure you break them. This however requires the scripter (which can be a non programmer) to be aware of implementation details. The ugly truth is that reference cycles will happen, and when they do it’s hard to find them. You must have a garbage collector, and Python does. However Python’s GC has a mind of his own and of course does not know of our 16.7ms time limit.

Python allows you to tune the GC, you can turn it on and off, but you can’t make it keep the processing low. When it will decide to run you are most likely to skip frames. Very bad.

On a desktop however you can easily get away with the following things combined:

  1. Reference counting
  2. Relaxed memory amount constrains
  3. Explicitly calling the GC at key moments
Actually this was what I was doing on the desktop version. It turns out that since most of the deallocation of memory will be handled by the reference counting system, and that system is not too constrained on memory for possible reference cycles to be a problem, we can get away with manually calling the GC at key moments (e.g. before level changing). You might get a bit heavier on memory usage, but this is small price to pay for having a predictable frame length.

But on the iPhone you don’t get relaxed memory amount constrains. Memory is tight. There is no page file. You either turn the GC on and get frame skips when it runs, or you run it at least when you get a memory warning. In practise this doesn’t work out well. Not as well as I want it to at least. Which lead us to the decision to try out Lua.

Lua needs no introduction to game developers as it is well known for being lightweight, easy to embed and… featuring an incremental GC! This was the major push for adopting Lua in the mobile version of the engine. Having an incremental GC basically means that you can spread a full garbage collection cycle over many frames. This way you can keep it under a specific time budget. All you have to do is disable the automatic GC like this:

luagc(L, LUAGCCOLLECT, 0);
and then do a:
luagc(L, LUAGCSTEP, 200);
at the end of every frame to do a single garbage collection step. Depending of how big the parameter you pass to LUA_GCSTEP you will need more or less “steps�? to complete a full garbage collection cycle. The choice of the step parameter however is left for the village magician to determine! To quote the Lua documentation:
…larger values mean more steps in a non-specified way.
So you have to play around with the variable to find the sweet-spot that does not use up too much of your precious frame time, and also keeps the garbage generated from the frame to rational amounts. This can be hard, and at the end we get something close to what I had with Python and full GC on, as at some point even a single GC step will spike up to 7ms. And that is logical since the Lua GC has far more work to do that the Python’s one, as it is not backed by reference counting.

But the swich to Lua was not in vain since the problem can almost completely go away by dropping the GCSTEP parameter down to zero and calling the GC repeatedly until a specific time frame expires. Something that looks like this:

double endTime = getTime() + 0.001;
do {
  luagc(L, LUAGCSTEP, 0);
} while( getTime() < endTime );
With the above code we almost get it to run as we want it. However notice that the above code does not turn the Lua GC into a realtime GC. The above will run the GC for a minimum of 1ms. Every frame the GC will takeup at least 1ms, even when there is no need to. That doesn’t feel really good doing on a device running on batteries. Apart from that we have an implementation that will hardly skip a frame due to the GC. And that is a win over the Python version.

Closing words

In general the overall transition can be called a success and it was worth the effort. I now have an engine that has much smaller footprint in memory size, more consistent behaviour and I got to keep all the good things. Here someone can raise several arguments like the loss of bitwise operations, native integer number support, even the count from one in arrays. But in the end as it turns out, it is not that much of a problem after all. The true power of Lua lies in its flexibility. The language is so open-ended that allows you to easily mimic any other language contruct (I still do miss the powerful concept of channels in Python though).

Also the simplicity of the Lua implementation as an interpreter can allow for easy understanding of the inner workings and makes it possible to even make changes right away. For example I was able to improve upon the garbage collection solution I laid out earlier so that I have an almost “realtime�? GC, with steps constrained in duration, just by changing a few lines of code in the interpreter…

…I will save that for an other post though.