Table of
contents

8.1.2010The year I started blogging (blogware)
9.1.2010Linux initramfs with iSCSI and bonding support for PXE booting
9.1.2010Using manually tweaked PTX assembly in your CUDA 2 program
9.1.2010OpenCL autoconf m4 macro
9.1.2010Mandelbrot with MPI
10.1.2010Using dynamic libraries for modular client threads
11.1.2010Creating an OpenGL 3 context with GLX
11.1.2010Creating a double buffered X window with the DBE X extension
12.1.2010A simple random file read benchmark
14.12.2011Change local passwords via RoundCube safer
5.1.2012Multi-GPU CUDA stress test
6.1.2012CUDA (Driver API) + nvcc autoconf macro
29.5.2012CUDA (or OpenGL) video capture in Linux
31.7.2012GPGPU abstraction framework (CUDA/OpenCL + OpenGL)
7.8.2012OpenGL (4.3) compute shader example
10.12.2012GPGPU face-off: K20 vs 7970 vs GTX680 vs M2050 vs GTX580
4.8.2013DAViCal with Windows Phone 8 GDR2
5.5.2015Sample pattern generator



5.5.2015

Sample pattern generator

Introduction

Good sampling patterns are central to many kinds of numerical approximations, but here I'm focusing on features that are important for graphics. So, if you are implementing PCF, SSAO, SSR, or generating rays for a path tracer (e.g. by drawing from a GGX distribution), this post could be of interest to you.

Requirements

Although I like theorizing about different sampling patterns, in practical implementations I'm only concerned about a handful of practical properties:

Implementation

To nail the above points, this sample pattern generator works as follows:

  1. It generates all points for all subsets in one pool. First the sample space is discretized into a 256x256 grid (to match a 8b-per-component output texture) and samples are inserted one by one. For each inserted point the entire sample space is traversed to find the cell for which the distance from the nearest existing point in the set is the largest. The PDF is taken into account at this point by numerically integrating distances between points and using the weighted value instead of the cartesian distance. The density function is a user-specified black box function, and it doesn't have to be normalized.
  2. Essentially the same process is repeated to distribute the samples into subsets: Points are drawn from the full set and inserted into subsets in a round-robin fashion. The PDF is respected in this phase as well, and each subset retrieves the point from the full set that is as far as possible from the other points in the subset.
  3. Points in the subsets are spatially sorted w.r.t. the other subsets; the first subset remains unchanged, but points in the other sets are sorted to match the spatial location of the points in the first set as closely as possible.
  4. Finally a PNG file (or a printout) is written with the sample data. You should customize the writer function if you want a non-default layout. Optionally, inverse of the probability coefficients is also written. Just multiply the sample with this coefficient to undo the bias introduced by the PDF.

ASCII awesomeness

The whole process is visualized in ASCII. This example uses a 1/(1 + r^2) density function. First the initial pool of samples is being generated (click to enlarge):
generating

Next the generated subsets are shown:
subset3 subset7

And after sorting, samples at each index for all subsets:
index4 index9

Finally, the full set. Each subset has its own color, and each index has its own symbol:
all_sets

Download

Don't hate me, but this is again only for UNIX. Otoh it's only dependent on libpng and runs fine on OSX. Tweak the defines in main.cpp to your liking and: make && ./sampling
sampling-1.0.tar.gz

Enjoy your PNG!

Caveats

Firstly, the initial sample set doesn't cover the area perfectly uniformly, i.e. follow a maximal Poisson disk distribution. If this is a problem, a different initial distribution should be easy to drop in. Check out the literature; ways exist.

Secondly, the spatial sorting of the sets can only maintain a good coherence for all but the last couple of indices. This gives you optimal cache coherency for majority of the indices at the expense of the last few. I reckon this is generally better than having uniform but suboptimal coherency for all indices, but it depends on the HW and cache characteristics of your algorithm.

Comments






Nick     E-mail   (optional)

Is this spam? (answer opposite of "yes" and add "pe")