Ville Timonen

contents

8.1.2010 | The year I started blogging (blogware) |

9.1.2010 | Linux initramfs with iSCSI and bonding support for PXE booting |

9.1.2010 | Using manually tweaked PTX assembly in your CUDA 2 program |

9.1.2010 | OpenCL autoconf m4 macro |

9.1.2010 | Mandelbrot with MPI |

10.1.2010 | Using dynamic libraries for modular client threads |

11.1.2010 | Creating an OpenGL 3 context with GLX |

11.1.2010 | Creating a double buffered X window with the DBE X extension |

12.1.2010 | A simple random file read benchmark |

14.12.2011 | Change local passwords via RoundCube safer |

5.1.2012 | Multi-GPU CUDA stress test |

6.1.2012 | CUDA (Driver API) + nvcc autoconf macro |

29.5.2012 | CUDA (or OpenGL) video capture in Linux |

31.7.2012 | GPGPU abstraction framework (CUDA/OpenCL + OpenGL) |

7.8.2012 | OpenGL (4.3) compute shader example |

10.12.2012 | GPGPU face-off: K20 vs 7970 vs GTX680 vs M2050 vs GTX580 |

4.8.2013 | DAViCal with Windows Phone 8 GDR2 |

5.5.2015 | Sample pattern generator |

Although I like theorizing about different sampling patterns, in practical implementations I'm only concerned about a handful of practical properties:

- Even coverage of the sample space (often unit circle)
- Divisible into subsets that optimally complement each other, which is necessary for interleaved sampling both spatially and temporally
- Each subset is an even distribution across the sample space in itself
- Samples at a given index in all subsets are spatially close to each other, which helps cache efficiency in SIMT machines (GPUs)
- Samples can be generated according to a density function (for SSAO for instance)
- If said probability density function is used, export inverse coefficients to unbias the results after sampling

To nail the above points, this sample pattern generator works as follows:

- It generates all points for all subsets in one pool. First the sample space is discretized into a 256x256 grid (to match a 8b-per-component output texture) and samples are inserted one by one. For each inserted point the entire sample space is traversed to find the cell for which the distance from the nearest existing point in the set is the largest. The PDF is taken into account at this point by numerically integrating distances between points and using the weighted value instead of the cartesian distance. The density function is a user-specified black box function, and it doesn't have to be normalized.
- Essentially the same process is repeated to distribute the samples into subsets: Points are drawn from the full set and inserted into subsets in a round-robin fashion. The PDF is respected in this phase as well, and each subset retrieves the point from the full set that is as far as possible from the other points in the subset.
- Points in the subsets are spatially sorted w.r.t. the other subsets; the first subset remains unchanged, but points in the other sets are sorted to match the spatial location of the points in the first set as closely as possible.
- Finally a PNG file (or a printout) is written with the sample data. You should customize the writer function if you want a non-default layout. Optionally, inverse of the probability coefficients is also written. Just multiply the sample with this coefficient to undo the bias introduced by the PDF.

The whole process is visualized in ASCII. This example uses a 1/(1 + r^2) density function.
First the initial pool of samples is being generated (click to enlarge):

Next the generated subsets are shown:

And after sorting, samples at each index for all subsets:

Finally, the full set. Each subset has its own color, and each index has its own symbol:

Don't hate me, but this is again only for UNIX. Otoh it's only dependent on libpng and runs fine on OSX. Tweak the defines in main.cpp to your liking and: make && ./sampling

sampling-1.0.tar.gz

Enjoy your PNG!

Firstly, the initial sample set doesn't cover the area perfectly uniformly, i.e. follow a maximal Poisson disk distribution. If this is a problem, a different initial distribution should be easy to drop in. Check out the literature; ways exist.

Secondly, the spatial sorting of the sets can only maintain a good coherence for all but the last couple of indices. This gives you optimal cache coherency for majority of the indices at the expense of the last few. I reckon this is generally better than having uniform but suboptimal coherency for all indices, but it depends on the HW and cache characteristics of your algorithm.