OpenGL VBOs vs Display Lists: Batching


Benchmark:

I wrote a benchmark which renders a procedural triangle mesh using display lists or vertex buffer objects (VBOs). Each display list contains a single vertex array of vertices (glDrawElements) with 3 floats per vertex.

The blades are grouped in batches. Each batch is rendered using glBegin/glEnd, vertex arrays (one array per attribute), a display list of vertex arrays, or separate VBOs.

The total number of rendered triangles can be changed arbitrarily. Vertices can be rendered with additional attributes: GL_FLOAT vertex normals, and GL_UNSIGNED_BYTE vertex colors.

Source code + Windows binary:
grass_benchmark-1.2.zip (November 3, 2005)


Too many draw calls:

Here is a graph showing frame rate according to the number of display lists when rendering scenes of 0.5M, 1.0M, and 2.0M triangles, under Linux with a GeForce 6800 GT and NVIDIA's driver 71.74.

Using seperate VBOs for vertex coordinates, normals and colors:

  • NVIDIA:

Using display lists of separate vertex arrays:

  • ATI:

The same experiments on ATI cards with Catalyst 8.162 performed independantly of the number of batches.


Not enough draw calls:

VBOs that have over a certain number of vertices (size of the pre-transform cache) can make the fps drop dramatically. Here are pre-transform cache sizes computed by the benchmark on different graphics cards.

Command line:
$ ./grass_benchmark -tris 10000000 -pre-tl -pre-tl-jump 0.1 -method VBO

Graphics Card Estimated pre-T&L cache
Quadro 4 900 XGL 7,602 vertices
GeForce MX 4000 39,998 vertices
GeForce 6600 GT 63,630 vertices
GeForce 6800 GT 63,630 vertices
GeForce 7800 GTX 10^6 vertices
GeForce 7900 GTX 10^6 vertices


Louis Bavoil