Thoughts Serializer

Computer graphics, Games, Personal

Optimizing script language performance with custom memory allocators

2 Comments »

The last weekend I did some exploration on the script language execution performance. Specifically on the memory allocation side of things, and I would like to share my findings.

Script languages and memory usage

As you probably know script languages (most of them at least, like Python, Lua, etc) have the tendency to make a huge amount of small allocations on the heap. Almost everything is stored on the heap, and if you care for performance, you start to feel homesick about your beloved C stack! Anyway, nothing comes for free, and scripting languages have to take something from you in exchange for all the goods it gives you back. So the best you can do is make sure that you have the best memory allocator for the job.

Doing too many small allocations and releases on the heap can create memory fragmentation, along with all the evil that comes with this. The common approach is to create a specialized memory allocator that serves small and constant in size blocks of memory to the scripting language, taken from a bigger chunk of memory reserved from the system. This is a common in all “realtime” and intencive applications like games, and something I did many times to gain performance.

Can’t beat the standard malloc

What I discovered with my latest attempt was that it has gotten quite hard to beat the GNU implementation of malloc(). Something that used to be easy in the past when you focused on a specialized case (e.g. small blocks of memory). Not that you can’t do better if you try hard, but at this point the malloc() implementation is already super-fast for 99.9% of applications on the desktop. Rest asured that you will not be able to do much better. However that is not the case for embedded devices that don’t share the same virtual memory benefits as the desktop computers.

My hand tuned specialized memory allocator for small blocks of memory ( <= 256bytes ) was not able to be more that 1% faster that the native malloc() on the OS X 10.6. However on the iPhone the same allocator was twice as fast as the native malloc() ! Since the target was from the begining the iPhone that seemed like big win! However when I set up a small benchmark in the scripting environment that did some allocations of game engine objects and released then again in various patterns, the results were disappointing. The gain from using my specialized (and twice as fast) allocator resulted in improvement of about 5% in execution speed in a memory intensive benchmark. And at some tests even slower! That was odd and most of all not good!

Why I was failing

After some inspections and tests that made the case of me doing something really stupid less probable, I narrowed down the cause.

In most cases of using a scripting language you have some classes defined in C++ that you instatiate in the scripting language. Take for example a 3D vector class “CVector3″ defined in C++. When you instatiate this in the script language you get two allocations. One in the scripting language that allocates the “proxy” object and one in the C++ environment. When giving a new allocator to the scripting language to do its allocations you only “optimize” the first allocation. The one in C++ still goes through the system default allocator.

And since you optimize half of the allocations you expect to have half the performance boost… well… wrong. It turns out that you can even be slower this way. The secret here is the CPU cache. By doing the above, you have two memory blocks that are usually accessed together, but are far apart in memory. This can really hurt performance badly on a device with slow memory like the iPhone.

The solution

The solution was of course to use the same allocator on the C++ side by overriding the “new” operator of the class. This made the blocks of memory allocated on the script side to be close to the block allocated on the C++ side. This way access to the object only involves accessing one part of the memory and giving nice cache hits. Performance up by 30%, which was nice and expected.

One other interesting thing that I found from this is that, on the iPhone, if I just override the “new” operator of a class and make it allocate the memory with plain malloc() and don’t use my allocator at all, the system is again faster!

This is probably from the fact that “new” does not go through plain malloc() (didn’t bother to check) as the scripting language environment does. So the allocated blocks end up in differect arenas at different parts of the memory, with the result of losing performance for the same reason as above!

So, keep your related allocations close together when crossing the language barrier!

Ray Tracing into a Sparse Voxel Octree

4 Comments »

And just when you thought you were through with tracing things all over the place… John Carmack strikes back with a mortal blow with something about ray tracing into a sparse voxel octree!!

The article doesn’t really say much (nothing actually) about the algorithm, and this is where the fun/fuss starts! I can’t wait to see all the amazing/crazy ideas people from all over world will come up with, about what John is actually talking about. Plots over plots will emerge.. flames.. Read the rest of this entry »

NVIDIA to Acquire AGEIA Technologies

No Comments »

According to this press release, nVidia will acquire Ageia Technologies. Yeap! The well known physics software and hardware vendor. In my mind this means that the future nVidia based accelerators will support physics acceleration, too. It will basically mean the death of the PhysX processor, since the GPU can do that easily with no extra cost.

Actually the PPU solution was never to work. I find it quite hard to believe people would ever Read the rest of this entry »

The wait is over… Sylphis3D is open source!

11 Comments »

I just release the source to Sylphis3D! Check out the story at the Developer Network.

The wait is over! Sylphis3D is officially released under the GNU GPL ver.2 (with the classpath exception for those that need closed source solutions). The engine weights at around 45000 lines of source code written in C++ and Python.The source code can be obtained from the download page of the [sourceforge.net project page](http://www.sf.net/projects/sylphis3d). Latter on the source will be added to the subversion repository for easier access.The source code compiles under Microsoft Visual Studio .NET 2003. The makefiles and sconsturct files, for compiling with GCC, are out of date. However the mapcompiler is up to date. The source would compile out of the box. Read the rest of this entry »

Tomorrow the Release

2 Comments »

The time has come… tomorrow is the release day of Sylphis3D as an open source project. I’m very excited for this new begining! This is going to be my biggest contribution to the open source community until now.

The source that is going to be released counts ~45000 lines of code in C++ and Python counted with SLOCCount and the development cost was evaluated at $1.500.000 !!!

Oh.. well…. :)

sylphis3d, release, open source, GPL, 3d, engine, opengl

Open Sourcing Progress Update

5 Comments »

As an update, I inform you that I’m pasting the license on the source files and getting the release ready. I’m going to publish on the sourceforge site where the Sylphis Generalized Triangle optimizer is published at the CVS. The engine is going to be hosted using SVN. Maybe there will be downloadable versions too, but I’m not sure yet. So get your SVN clients ready…

… until then happy Easter people!!

Opensource License

9 Comments »

The last days I’m spending most of my time considering open source licenses and what would be the appropriate license for Sylphis3D. I must say that it is a very brain-melting procedure. I can see now why I could never became a lawyer!

I initially started considering two licenses, the GPL and the BSD. These are both approved open source licenses by the FSF. GPL is the defacto open source license today and has proven its value. Most of open source software today is released under the GPL, including Linux. The license was proven to be able to protect and empower the freedom of the software, by forcing code to be contributed back to the original GPLed software. The BSD on the other side is a more liberal license. Requires for the adopters of the code to make no more than to mention the code that was used. They are not required to release their code back. This is looked upon by some open source people because it allows closed source projects to benefit from open source, without ever contributing back. The classic example here is the Windows operating system that used the networking stack of the freeBSD operating system; no code was ever contributed back by Microsoft.

The problem with GPL is that it is not an easy solution when it comes to 3D game engines. A GPLed engine Read the rest of this entry »

No Second Carnival of Game Programming

No Comments »

It’s been a month since the last Carnival of Game Programming and unfortunately there is not going to be a second one. The number of submitions was low and I can’t publish a carnival like that.

When a critical mass of posts is collected we will have a second carnival… no fixed dates and dead lines…

Sylphis3D goes open source : BSD or GPL ?

26 Comments »

I think this is good news on the doorstep today! Yeap! After long thoughts I came to the decision to finally open the source code of Sylphis3D. This is going to be a big step for the development of the game engine and Read the rest of this entry »

HDR Procedural Skies

2 Comments »

The under development branch of Sylphis3D is going to support high quality terrain rendering. At the moment the terrain rendering code is in place and produces some very nice views! However the sky support was limited to skybox rendering. At first I thought I would just go for some HDR textured skybox. This was good until I reallized that it would be stupid to have an engine support realtime shadowing and lighting and have the outdoor areas with static lights, because of a static skybox. It was obvious that a dynamic sky was needed, so that day/night cycles can be simulated.

After a lot of experimentations and book studing about light scatering and stuff, we had results! Read the rest of this entry »