8.1.2010	The year I started blogging (blogware)
9.1.2010	Linux initramfs with iSCSI and bonding support for PXE booting
9.1.2010	Using manually tweaked PTX assembly in your CUDA 2 program
9.1.2010	OpenCL autoconf m4 macro
9.1.2010	Mandelbrot with MPI
10.1.2010	Using dynamic libraries for modular client threads
11.1.2010	Creating an OpenGL 3 context with GLX
11.1.2010	Creating a double buffered X window with the DBE X extension
12.1.2010	A simple random file read benchmark
14.12.2011	Change local passwords via RoundCube safer
5.1.2012	Multi-GPU CUDA stress test
6.1.2012	CUDA (Driver API) + nvcc autoconf macro
29.5.2012	CUDA (or OpenGL) video capture in Linux
31.7.2012	GPGPU abstraction framework (CUDA/OpenCL + OpenGL)
7.8.2012	OpenGL (4.3) compute shader example
10.12.2012	GPGPU face-off: K20 vs 7970 vs GTX680 vs M2050 vs GTX580
4.8.2013	DAViCal with Windows Phone 8 GDR2
5.5.2015	Sample pattern generator

6.1.2012

CUDA (Driver API) + nvcc autoconf macro

There are 2 main ways to use CUDA:

Using runtime API you write all your GPU related code (device + host) in .cu files and compile them with nvcc. Nvcc is responsible for calling gcc/g++ on the host code and generating device binaries from the device (C/C++) code.
Driver API requires you to compile device code into binary or PTX format manually. CUDA is used as a regular library from the host code that you compile directly with gcc. Device code is loaded via library calls.

I've written programs using both ways, but I find the driver API more to my liking. Things are explicit and host/device code is hard to confuse with one another, and it feels like you have more control over how things are done. Not too long ago driver API also exposed some features of CUDA that runtime API didn't, such as half float textures, but this is no longer the case. In fact, due to runtime API being the de-facto today, you will have better luck with library support, documentation, tutorials, etc using the runtime API. First impressions die hard though, I suppose, which is half-why I continue picking the driver API given the choice.

My favourite way of using the driver API is to procedurally generate the device code at runtime, compile it with nvcc on the fly, and upload the PTX source. The main reason is that I often have algorithms that have complicated configuration switches, and the GPU code can be optimized for specific values of them. Making the GPU kernels dynamic enough to handle all configurations is a waste of clock cycles, and preprocessor directives are simply not powerful enough to handle complex cases of code customization. Besides, you can't have runtime configurations with preprocessor magic, and that's too static for my taste.

Let the above work as an intro as to what I'm looking for in a CUDA autoconf macro:

For compilation, I need the the driver API header cuda.h (comes with the CUDA toolkit)
For linking, I need the location of the driver API library libcuda.so (kernel driver)
For runtime, I need the path of nvcc (CUDA toolkit).

So I hacked together the following m4 macro. Insert AX_CHECK_CUDA in your configure.ac file, and if your CUDA is not installed in its default path in /usr/local/cuda, instruct the configure script with --with-cuda=/opt/cuda. The rest should be explained in the comments.

	
##### 

#

# SYNOPSIS

#

# AX_CHECK_CUDA

#

# DESCRIPTION

#

# Figures out if CUDA Driver API/nvcc is available, i.e. existence of:

#   cuda.h

#   libcuda.so

#   nvcc

#

# If something isn't found, fails straight away.

#

# Locations of these are included in 

#   CUDA_CFLAGS and 

#   CUDA_LDFLAGS.

# Path to nvcc is included as

#   NVCC_PATH

# in config.h

# 

# The author is personally using CUDA such that the .cu code is generated

# at runtime, so don't expect any automake magic to exist for compile time

# compilation of .cu files.

#

# LICENCE

# Public domain

#

# AUTHOR

# wili

#

##### 

AC_DEFUN([AX_CHECK_CUDA], [

# Provide your CUDA path with this      

AC_ARG_WITH(cuda, [  --with-cuda=PREFIX      Prefix of your CUDA installation], [cuda_prefix=$withval], [cuda_prefix="/usr/local/cuda"])

# Setting the prefix to the default if only --with-cuda was given

if test "$cuda_prefix" == "yes"; then

    if test "$withval" == "yes"; then

        cuda_prefix="/usr/local/cuda"

    fi

fi

# Checking for nvcc

AC_MSG_CHECKING([nvcc in $cuda_prefix/bin])

if test -x "$cuda_prefix/bin/nvcc"; then

    AC_MSG_RESULT([found])

    AC_DEFINE_UNQUOTED([NVCC_PATH], ["$cuda_prefix/bin/nvcc"], [Path to nvcc binary])

else

    AC_MSG_RESULT([not found!])

    AC_MSG_FAILURE([nvcc was not found in $cuda_prefix/bin])

fi

# We need to add the CUDA search directories for header and lib searches

# Saving the current flags

ax_save_CFLAGS="${CFLAGS}"

ax_save_LDFLAGS="${LDFLAGS}"

# Announcing the new variables

AC_SUBST([CUDA_CFLAGS])

AC_SUBST([CUDA_LDFLAGS])

CUDA_CFLAGS="-I$cuda_prefix/include"

CFLAGS="$CUDA_CFLAGS $CFLAGS"

CUDA_LDFLAGS="-L$cuda_prefix/lib"

LDFLAGS="$CUDA_LDFLAGS $LDFLAGS"

# And the header and the lib

AC_CHECK_HEADER([cuda.h], [], AC_MSG_FAILURE([Couldn't find cuda.h]), [#include <cuda.h>])

AC_CHECK_LIB([cuda], [cuInit], [], AC_MSG_FAILURE([Couldn't find libcuda]))

# Returning to the original flags

CFLAGS=${ax_save_CFLAGS}

LDFLAGS=${ax_save_LDFLAGS}

])

ax_check_cuda.m4

wili
Ville Timonen

hack blog

Table of
contents

6.1.2012

CUDA (Driver API) + nvcc autoconf macro

Comments

wili Ville Timonen

hack blog

Table ofcontents

6.1.2012

CUDA (Driver API) + nvcc autoconf macro

Comments

wili
Ville Timonen

Table of
contents