8.1.2010	The year I started blogging (blogware)
9.1.2010	Linux initramfs with iSCSI and bonding support for PXE booting
9.1.2010	Using manually tweaked PTX assembly in your CUDA 2 program
9.1.2010	OpenCL autoconf m4 macro
9.1.2010	Mandelbrot with MPI
10.1.2010	Using dynamic libraries for modular client threads
11.1.2010	Creating an OpenGL 3 context with GLX
11.1.2010	Creating a double buffered X window with the DBE X extension
12.1.2010	A simple random file read benchmark
14.12.2011	Change local passwords via RoundCube safer
5.1.2012	Multi-GPU CUDA stress test
6.1.2012	CUDA (Driver API) + nvcc autoconf macro
29.5.2012	CUDA (or OpenGL) video capture in Linux
31.7.2012	GPGPU abstraction framework (CUDA/OpenCL + OpenGL)
7.8.2012	OpenGL (4.3) compute shader example
10.12.2012	GPGPU face-off: K20 vs 7970 vs GTX680 vs M2050 vs GTX580
4.8.2013	DAViCal with Windows Phone 8 GDR2
5.5.2015	Sample pattern generator

29.5.2012

CUDA (or OpenGL) video capture in Linux

Video capturing using CUDA.. that sounds a bit odd, doesn't it? Well, the motivation for me was this: Many graphics algorithms I develop rely on GPGPU and the rendering result is first available in a CUDA buffer. Also, video capturing that does not slow down the actual application is not a trivial task, and CUDA offers nicely explicit async transfer modes to batch transfers off the GPU as much in the background as possible.

This can be used in conjunction with an entirely OpenGL engine as well, if you're willing to accept CUDA as the dependency: Simply pass your OpenGL render target into CUDA using the CUDA OpenGL interoperability API, and feed the mapped CUDA buffer as the input to this video capturer. The overhead of mapping the render target in CUDA should be miniscule considering the whole task.

So the key idea is as follows

Initialize the video capturer, give it a pointer to your CUDA buffer that will contain the rendered frame
In your main rendering loop, ask the video capturer to take a snapshot of the data
This snapshot is a fast synchronous copy into device memory, which holds a buffer of multiple frames
This buffer is asynchronously fed to the host memory through PCI-E
Frames on the host side are converted into their final pixel format and crop size (in a separate thread) and stored into a pool of frames that resizes on-the-fly
One or more writer threads consume frames from this pool as fast as they can and write the frames as PNG images onto disk

The resulting lossless PNG frames can be easily encoded into a video format of your liking by using mencoder.

The video capturer is used like this

#include "vidCap.h"

// CUDA has to be initialized prior to this

// 256MB device buffer, 4 PNG writer threads

vidCap = new CUDAVidCap(256*1024*1024, 4);

vidCap->setSource((void*)yourCudaBuffer);

// Use floating point channels

vidCap->floatSource(true);

// RGBA

vidCap->numComponents(4);

// We're using borders here, which we cut from the video

vidCap->dimensions(W + borderW*2, H + borderH*2);

// Cropping is optional

vidCap->setCrop(borderW, borderH, W + borderW, H + borderH);

vidCap->picPrefix("capturedFrames/frame");

while (mainLoop) {

    render();

    vidCap->snap();

}

// Wait for the transfers and PNG writers

// (..or let the destructor do it)

vidCap->wait();

Public domain. Enjoy!
vidCap.tar.gz

Comments

20.6.2021

Hey, Ville,

   Still there? I'm still using your cool cudavidcap. Since 2013.

   Jerome

- JBB@JeromeBerryhill.com

19.3.2023

Yeah still here.  Surprised to see this found use.  Cool! :-)

- wili

wili
Ville Timonen

hack blog

Table of
contents

29.5.2012

CUDA (or OpenGL) video capture in Linux

Comments

20.6.2021

19.3.2023

wili Ville Timonen

hack blog

Table ofcontents

29.5.2012

CUDA (or OpenGL) video capture in Linux

Comments

20.6.2021

19.3.2023

wili
Ville Timonen

Table of
contents