Table of

8.1.2010The year I started blogging (blogware)
9.1.2010Linux initramfs with iSCSI and bonding support for PXE booting
9.1.2010Using manually tweaked PTX assembly in your CUDA 2 program
9.1.2010OpenCL autoconf m4 macro
9.1.2010Mandelbrot with MPI
10.1.2010Using dynamic libraries for modular client threads
11.1.2010Creating an OpenGL 3 context with GLX
11.1.2010Creating a double buffered X window with the DBE X extension
12.1.2010A simple random file read benchmark
14.12.2011Change local passwords via RoundCube safer
5.1.2012Multi-GPU CUDA stress test
6.1.2012CUDA (Driver API) + nvcc autoconf macro
29.5.2012CUDA (or OpenGL) video capture in Linux
31.7.2012GPGPU abstraction framework (CUDA/OpenCL + OpenGL)
7.8.2012OpenGL (4.3) compute shader example
10.12.2012GPGPU face-off: K20 vs 7970 vs GTX680 vs M2050 vs GTX580
4.8.2013DAViCal with Windows Phone 8 GDR2
5.5.2015Sample pattern generator


CUDA (Driver API) + nvcc autoconf macro

There are 2 main ways to use CUDA:

I've written programs using both ways, but I find the driver API more to my liking. Things are explicit and host/device code is hard to confuse with one another, and it feels like you have more control over how things are done. Not too long ago driver API also exposed some features of CUDA that runtime API didn't, such as half float textures, but this is no longer the case. In fact, due to runtime API being the de-facto today, you will have better luck with library support, documentation, tutorials, etc using the runtime API. First impressions die hard though, I suppose, which is half-why I continue picking the driver API given the choice.

My favourite way of using the driver API is to procedurally generate the device code at runtime, compile it with nvcc on the fly, and upload the PTX source. The main reason is that I often have algorithms that have complicated configuration switches, and the GPU code can be optimized for specific values of them. Making the GPU kernels dynamic enough to handle all configurations is a waste of clock cycles, and preprocessor directives are simply not powerful enough to handle complex cases of code customization. Besides, you can't have runtime configurations with preprocessor magic, and that's too static for my taste.

Let the above work as an intro as to what I'm looking for in a CUDA autoconf macro:

So I hacked together the following m4 macro. Insert AX_CHECK_CUDA in your file, and if your CUDA is not installed in its default path in /usr/local/cuda, instruct the configure script with --with-cuda=/opt/cuda. The rest should be explained in the comments.

# Figures out if CUDA Driver API/nvcc is available, i.e. existence of:
#   cuda.h
#   nvcc
# If something isn't found, fails straight away.
# Locations of these are included in 
#   CUDA_CFLAGS and 
# Path to nvcc is included as
# in config.h

# The author is personally using CUDA such that the .cu code is generated
# at runtime, so don't expect any automake magic to exist for compile time
# compilation of .cu files.
# Public domain
# wili


# Provide your CUDA path with this      
AC_ARG_WITH(cuda, [  --with-cuda=PREFIX      Prefix of your CUDA installation], [cuda_prefix=$withval], [cuda_prefix="/usr/local/cuda"])

# Setting the prefix to the default if only --with-cuda was given
if test "$cuda_prefix" == "yes"; then
    if test "$withval" == "yes"; then

# Checking for nvcc
AC_MSG_CHECKING([nvcc in $cuda_prefix/bin])
if test -x "$cuda_prefix/bin/nvcc"; then
    AC_DEFINE_UNQUOTED([NVCC_PATH], ["$cuda_prefix/bin/nvcc"], [Path to nvcc binary])
    AC_MSG_RESULT([not found!])
    AC_MSG_FAILURE([nvcc was not found in $cuda_prefix/bin])

# We need to add the CUDA search directories for header and lib searches

# Saving the current flags

# Announcing the new variables


# And the header and the lib
AC_CHECK_HEADER([cuda.h], [], AC_MSG_FAILURE([Couldn't find cuda.h]), [#include <cuda.h>])
AC_CHECK_LIB([cuda], [cuInit], [], AC_MSG_FAILURE([Couldn't find libcuda]))

# Returning to the original flags



Nick     E-mail   (optional)

Is this spam? (answer opposite of "yes" and add "pe")