Table of

8.1.2010The year I started blogging (blogware)
9.1.2010Linux initramfs with iSCSI and bonding support for PXE booting
9.1.2010Using manually tweaked PTX assembly in your CUDA 2 program
9.1.2010OpenCL autoconf m4 macro
9.1.2010Mandelbrot with MPI
10.1.2010Using dynamic libraries for modular client threads
11.1.2010Creating an OpenGL 3 context with GLX
11.1.2010Creating a double buffered X window with the DBE X extension
12.1.2010A simple random file read benchmark
14.12.2011Change local passwords via RoundCube safer
5.1.2012Multi-GPU CUDA stress test
6.1.2012CUDA (Driver API) + nvcc autoconf macro
29.5.2012CUDA (or OpenGL) video capture in Linux
31.7.2012GPGPU abstraction framework (CUDA/OpenCL + OpenGL)
7.8.2012OpenGL (4.3) compute shader example
10.12.2012GPGPU face-off: K20 vs 7970 vs GTX680 vs M2050 vs GTX580
4.8.2013DAViCal with Windows Phone 8 GDR2
5.5.2015Sample pattern generator


CUDA (or OpenGL) video capture in Linux

Video capturing using CUDA.. that sounds a bit odd, doesn't it? Well, the motivation for me was this: Many graphics algorithms I develop rely on GPGPU and the rendering result is first available in a CUDA buffer. Also, video capturing that does not slow down the actual application is not a trivial task, and CUDA offers nicely explicit async transfer modes to batch transfers off the GPU as much in the background as possible.

This can be used in conjunction with an entirely OpenGL engine as well, if you're willing to accept CUDA as the dependency: Simply pass your OpenGL render target into CUDA using the CUDA OpenGL interoperability API, and feed the mapped CUDA buffer as the input to this video capturer. The overhead of mapping the render target in CUDA should be miniscule considering the whole task.

So the key idea is as follows

The resulting lossless PNG frames can be easily encoded into a video format of your liking by using mencoder.

The video capturer is used like this

#include "vidCap.h"

// CUDA has to be initialized prior to this

// 256MB device buffer, 4 PNG writer threads
vidCap = new CUDAVidCap(256*1024*10244);

// Use floating point channels

// We're using borders here, which we cut from the video
vidCap->dimensions(W + borderW*2, H + borderH*2);
// Cropping is optional
vidCap->setCrop(borderW, borderH, W + borderW, H + borderH);

while (mainLoop) {

// Wait for the transfers and PNG writers
// (..or let the destructor do it)

Public domain. Enjoy!


Nick     E-mail   (optional)

Is this spam? (answer "no")