8.1.2010	The year I started blogging (blogware)
9.1.2010	Linux initramfs with iSCSI and bonding support for PXE booting
9.1.2010	Using manually tweaked PTX assembly in your CUDA 2 program
9.1.2010	OpenCL autoconf m4 macro
9.1.2010	Mandelbrot with MPI
10.1.2010	Using dynamic libraries for modular client threads
11.1.2010	Creating an OpenGL 3 context with GLX
11.1.2010	Creating a double buffered X window with the DBE X extension
12.1.2010	A simple random file read benchmark
14.12.2011	Change local passwords via RoundCube safer
5.1.2012	Multi-GPU CUDA stress test
6.1.2012	CUDA (Driver API) + nvcc autoconf macro
29.5.2012	CUDA (or OpenGL) video capture in Linux
31.7.2012	GPGPU abstraction framework (CUDA/OpenCL + OpenGL)
7.8.2012	OpenGL (4.3) compute shader example
10.12.2012	GPGPU face-off: K20 vs 7970 vs GTX680 vs M2050 vs GTX580
4.8.2013	DAViCal with Windows Phone 8 GDR2
5.5.2015	Sample pattern generator

7.8.2012

OpenGL (4.3) compute shader example

Introduction

OpenGL 4.3 was released yesterday, and among the larger updates were compute shaders. Today, since I couldn't find a tutorial/example on google, I'm going to show you how to use them.

Compute shaders in the pipeline

The important thing to note is that while the other shaders have a fixed execution order, compute shaders can essentially alter any data anywhere. Shader objects within a program object are implicitly pipelined after another, and a program object is "ready to go" as it is. Compute shaders cannot be baked into a program object alongside other shaders as their execution order is not fixed. Instead, compute shaders have to be placed into program objects by themselves and the application has to instruct OpenGL about the execution order explicitly by switching on and off the compute shader program object and calling DispatchCompute*() to run the compute shaders.

OpenGL compute shaders are GLSL and similar to other shaders: you can read textures, images, and buffers and write images and buffers. Just like with other GPGPU implementations, threads are grouped into work groups and one compute shader invocation processes a bunch of work groups. The work group size is specified along with the kernel source code, and the number of work groups launched is given by the application as arguments to DispatchCompute*().

Example

You should know when to choose a compute shader over the other shaders for your algorithm (this is not one such example). The reasons to use GPGPU are universal and have nothing to do with OpenGL compute shaders specifically.

You can grab the full example program here, but the important files are main.cpp and opengl_cs.cpp. In main.cpp we create an OpenGL 4.3 context (I'm being strict and using a forward-compatible core profile, but you don't have to), a texture for the compute shader to write and the fragment shader to read, and two program objects. One object is for the compute shader and the other is for rendering (vertex + fragment shaders). After that we go into a loop where we update a counter in the compute shader, fill in the texture (as image2D), and blit the texture onto the screen.

#include "opengl.h"

GLuint renderHandle, computeHandle;

void updateTex(int);

void draw();

int main() {

    initGL();

    GLuint texHandle = genTexture();

    renderHandle = genRenderProg(texHandle);

    computeHandle = genComputeProg(texHandle);

    for (int i = 0; i < 1024; ++i) {

        updateTex(i);

        draw();

    }

    return 0;

}

void updateTex(int frame) {

    glUseProgram(computeHandle);

    glUniform1f(glGetUniformLocation(computeHandle, "roll"), (float)frame*0.01f);

    glDispatchCompute(512/16, 512/16, 1); // 512^2 threads in blocks of 16^2

    checkErrors("Dispatch compute shader");

}

void draw() {

    glUseProgram(renderHandle);

    glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);

    swapBuffers();

    checkErrors("Draw screen");

}

main.cpp

The compute shader set-up should look familiar as it's just another shader. (There are some specifics which are documented in the GLSLang specification.)

#include "opengl.h"

#include <stdio.h>

#include <stdlib.h>

GLuint genComputeProg(GLuint texHandle) {

    // Creating the compute shader, and the program object containing the shader

    GLuint progHandle = glCreateProgram();

    GLuint cs = glCreateShader(GL_COMPUTE_SHADER);

    // In order to write to a texture, we have to introduce it as image2D.

    // local_size_x/y/z layout variables define the work group size.

    // gl_GlobalInvocationID is a uvec3 variable giving the global ID of the thread,

    // gl_LocalInvocationID is the local index within the work group, and

    // gl_WorkGroupID is the work group's index

    const char *csSrc[] = {

        "#version 430\n",

        "uniform float roll;\

         uniform image2D destTex;\

         layout (local_size_x = 16, local_size_y = 16) in;\

         void main() {\

             ivec2 storePos = ivec2(gl_GlobalInvocationID.xy);\

             float localCoef = length(vec2(ivec2(gl_LocalInvocationID.xy)-8)/8.0);\

             float globalCoef = sin(float(gl_WorkGroupID.x+gl_WorkGroupID.y)*0.1 + roll)*0.5;\

             imageStore(destTex, storePos, vec4(1.0-globalCoef*localCoef, 0.0, 0.0, 0.0));\

         }"

    };

    glShaderSource(cs, 2, csSrc, NULL);

    glCompileShader(cs);

    int rvalue;

    glGetShaderiv(cs, GL_COMPILE_STATUS, &rvalue);

    if (!rvalue) {

        fprintf(stderr, "Error in compiling the compute shader\n");

        GLchar log[10240];

        GLsizei length;

        glGetShaderInfoLog(cs, 10239, &length, log);

        fprintf(stderr, "Compiler log:\n%s\n", log);

        exit(40);

    }

    glAttachShader(progHandle, cs);

    glLinkProgram(progHandle);

    glGetProgramiv(progHandle, GL_LINK_STATUS, &rvalue);

    if (!rvalue) {

        fprintf(stderr, "Error in linking compute shader program\n");

        GLchar log[10240];

        GLsizei length;

        glGetProgramInfoLog(progHandle, 10239, &length, log);

        fprintf(stderr, "Linker log:\n%s\n", log);

        exit(41);

    }   

    glUseProgram(progHandle);

    glUniform1i(glGetUniformLocation(progHandle, "destTex"), 0);

    checkErrors("Compute shader");

    return progHandle;

}

opengl_cs.cpp

compute shader demo

Discussion

But why did Khronos introduce compute shaders in OpenGL when they already had OpenCL and its OpenGL interoperability API? Well, OpenCL (and CUDA) are aimed for heavyweight GPGPU projects and offer more features. Also, OpenCL can run on many different types of hardware (apart from GPUs), which makes the API thick and complicated compared to light compute shaders. Finally, the explicit synchronization between OpenGL and OpenCL/CUDA is troublesome to do without crudely blocking (some of the required extensions are not even supported yet). With compute shaders, however, OpenGL is aware of all the dependencies and can schedule things smarter. This aspect of overhead might, in the end, be the most significant benefit for graphics algorithms which often execute for less than a millisecond.

Comments

31.8.2012

Great article, thanks!!!

- Rich

10.9.2012

Thank you very much!

- Aavci

12.9.2012

Nice! Thank you!

- linsnos

21.12.2012

Why do you set texHandle as arg of genRenderProg() and genRenderProg()? You havent even use it internally. I don't know how it supposed to work it that way...

- Wonderer

25.12.2012

Oh yeah you're right; I'm not using the parameter, so it's ignored.  There's no need to use it since it's bound to GL_TEXTURE0 during creation and kept bound throughout the program.

- wili

2.11.2013

Thank you

2.5.2014

Very helpful !!
-agb

- AB

13.9.2014

to anyone having problems compiling/running this with a nvidia card, try -L/usr/lib/nvidia-xxx with g++ (xxx being your driver version) and change "uniform image2D destTex" in the shader code to "writeonly uniform image2D destTex"

- meepo

3.9.2015

thank you @meepo

- nozam

19.9.2015

Thanks!

- jimmi

27.11.2016

Thank you!  This was easy to duplicate.  Well done.

- freeflyclone

4.10.2020

FYI, getting the following error while trying to run: 

Window depth 24, 800x600
OpenGL:
        vendor Intel Open Source Technology Center
        renderer Mesa DRI Intel(R) HD Graphics 5500 (Broadwell GT2) 
        version 4.6 (Core Profile) Mesa 19.3.4
        shader language 4.60
Extension "GL_ARB_compute_shader" found
Error in compiling the compute shader
Compiler log:
0:2(21): error: image uniforms not qualified with `writeonly' must have a format layout qualifier

- Misha

4.10.2020

FYI, getting the following error while trying to run: 

Window depth 24, 800x600
OpenGL:
        vendor Intel Open Source Technology Center
        renderer Mesa DRI Intel(R) HD Graphics 5500 (Broadwell GT2) 
        version 4.6 (Core Profile) Mesa 19.3.4
        shader language 4.60
Extension "GL_ARB_compute_shader" found
Error in compiling the compute shader
Compiler log:
0:2(21): error: image uniforms not qualified with `writeonly' must have a format layout qualifier

- Misha

4.10.2020

I changed to 
		 uniform writeonly image2D destTex;\
then it worked.

- Misha

wili
Ville Timonen

hack blog

Table of
contents

7.8.2012

OpenGL (4.3) compute shader example

Introduction

Compute shaders in the pipeline

Example

Discussion

Comments

31.8.2012

10.9.2012

12.9.2012

21.12.2012

25.12.2012

2.11.2013

2.5.2014

13.9.2014

3.9.2015

19.9.2015

27.11.2016

4.10.2020

4.10.2020

4.10.2020

wili Ville Timonen

hack blog

Table ofcontents

7.8.2012

OpenGL (4.3) compute shader example

Introduction

Compute shaders in the pipeline

Example

Discussion

Comments

31.8.2012

10.9.2012

12.9.2012

21.12.2012

25.12.2012

2.11.2013

2.5.2014

13.9.2014

3.9.2015

19.9.2015

27.11.2016

4.10.2020

4.10.2020

4.10.2020

wili
Ville Timonen

Table of
contents