Predictable garbage collection with Lua

In one of my previous posts I talked about how you can make the Lua garbage collector (GC) more predictable in running time. This is a virtue that is highly valued in a GC used in games where you don’t have the luxury of going over you frame time. In that post I described a solution to the problem which works fine most of the time, leaving little space for garbage collection times that will hurt the framerate. However I ended that post with a promise to provide a better solution and in this post, I deliver.

The ideal situation would be to have the GC run for a specific amount of time. This way the game engine will be able to assign exact CPU time to the GC based on the situation. For example one strategy would be to give a constant amount of time to the GC per frame. Lets say 2ms every frame. Or it can be more clever and take into consideration other parameters, like the amount of time it took to do the actual frame. Is there enough time left for this frame? If there is, spend some for GC, if not, hold it for the next frame when things might not be too tight. Other parameters can be memory thresholds, memory warnings, etc.

All of the above depend on a GC that can be instructed to run for an exact amount of time. This kind of GC is what we call a realtime GC. And Lua does not have one. However it turns out that we can get very close to realtime with minor changes to the Lua GC.

The patch below modifies the behavior of the GC in the way we need it:

--- a/src/lgc.c
+++ b/src/lgc.c
@@ -609,15 +617,14 @@ static l_mem singlestep (lua_State *L) {
 
 void luaC_step (lua_State *L) {
   global_State *g = G(L);
-  l_mem lim = (GCSTEPSIZE/100) * g->gcstepmul;
-  if (lim == 0)
-    lim = (MAX_LUMEM-1)/2;  /* no limit */
   g->gcdept += g->totalbytes - g->GCthreshold;
+  double start = getTime();
+  double end = start + (double)g->gcstepmul / 1000.0;  
   do {
-    lim -= singlestep(L);
+    singlestep(L);
     if (g->gcstate == GCSpause)
       break;
-  } while (lim > 0);
+  } while (getTime() < end);
   if (g->gcstate != GCSpause) {
     if (g->gcdept < GCSTEPSIZE)
       g->GCthreshold = g->totalbytes + GCSTEPSIZE;  /* - lim/g->gcstepmul;*/

The only missing part from the patch above is the getTime() that can be something like this:

double getTime() {
    struct timeval tp;
    gettimeofday(&tp, NULL);
    return (tp.tv_sec) + tp.tv_usec/1000000.0;
}

I guess however that everyone will want to use their own time function.

The patch modifies the code so that is stops based on a time limit and not based on a calculated target memory amount to be freed. The simplicity of the patch also comes from the fact that we “reuse” the STEPMUL parameter that is no longer used to carry the aggressiveness of the GC. We now use it to hold the exact duration we want the GC to run in milliseconds. So the usage will be this:

lua_gc(L, LUA_GCSETSTEPMUL, gcMilliSeconds);
lua_gc(L, LUA_GCSTEP, 0);

The above code will run the GC for gcMilliSeconds ms. This way you will never be out of your frame time budget, because the garbage collection took a little longer to execute. Problem solved!

  • This is a clever approach but there’s one potential problem: what if more garbage is accumulated than can be collected in the allotted time? Did you ever run into this kind of issue?

    What about the simplest and most effective solutions of all: simply call GC manually each frame. That’s how I’ve seen it done in the game engines which make use of Lua. Any garbage accumulated this frame is collected right away, instead of accumulating until a threshold is reached which causes the GC to collect everything in one frame.

    Combined with a GC log statement that prints out a warning if more than a certain threshold of garbage memory is collected in a frame, it helps to keep the collector in check AND pinpoint problematic code (ie lots of string concatenation).

    I guess my question is whether you did this optimization out of an actual need, and if so, what are your observations prior to and after making the GC changes?

  • Hi Steffan! I did it out of need because even with small step sizes there where frames that the GC step took a spike and that caused problems. I needed to “scissor” that spikes in GC time and distribute that time to the next frame. It is true that you can have memory accumulating if you set the time limit too low. That is easily solvable however by addaping the time limit based some threashhold crossing even falling back to regular steping for a while. I didn’t have to do that yet though…

    • Did you find these spikes with Instruments or measuring the GC time manually? Anyway, I’ll certainly keep an eye on the GC and will try your modifications just to see the difference.

  • Kaj

    I know this article is very old, but I tried this patch to avoid some serious spiking in our game and it doesn’t work. Singlestep can call the atomic function in one of its phases and that function alone currently takes 9 ms in our game. Yes, we have a fair amount of lua memory in use and a lot of garbage, but bottom line is, the timing constraint there only works for the sweep and incremental mark phase, the actual final mark phase takes as long as it does, and for a game using lots of lua in a non garbage optimized way it’s going to take long :o(