OpenGL VBOs vs Display Lists: Batching
Benchmark:
I wrote a benchmark which renders a procedural triangle mesh using display lists or vertex buffer objects (VBOs). Each display list contains a single vertex array of vertices (glDrawElements) with 3 floats per vertex.
The blades are grouped in batches. Each batch is rendered using glBegin/glEnd, vertex arrays (one array per attribute), a display list of vertex arrays, or separate VBOs.
The total number of rendered triangles can be changed arbitrarily. Vertices can be rendered with additional attributes: GL_FLOAT vertex normals, and GL_UNSIGNED_BYTE vertex colors.
Source code + Windows binary:
grass_benchmark-1.2.zip (November 3, 2005)
Too many draw calls:
Here is a graph showing frame rate according to the number of display lists when rendering scenes of 0.5M, 1.0M, and 2.0M triangles, under Linux with a GeForce 6800 GT and NVIDIA's driver 71.74.
Using seperate VBOs for vertex coordinates, normals and colors:
- NVIDIA:
Using display lists of separate vertex arrays:
- ATI:
The same experiments on ATI cards with Catalyst 8.162 performed independantly of the number of batches.
Not enough draw calls:
VBOs that have over a certain number of vertices (size of the pre-transform cache) can make the fps drop dramatically. Here are pre-transform cache sizes computed by the benchmark on different graphics cards.
Command line:
$ ./grass_benchmark -tris 10000000 -pre-tl -pre-tl-jump 0.1 -method VBO
Graphics Card Estimated pre-T&L cache Quadro 4 900 XGL 7,602 vertices GeForce MX 4000 39,998 vertices GeForce 6600 GT 63,630 vertices GeForce 6800 GT 63,630 vertices GeForce 7800 GTX 10^6 vertices GeForce 7900 GTX 10^6 vertices
Louis Bavoil