Some Experiments with Fast Wavetable Oscillators

The following is adapted from a message sent to an SGI computer-music alias maintained by Adrian Freed . Drop Adrian some e-mail if you'd like to be added to the alias. At any rate, this is a little old, and some of the findings are out-of-date considering newer processors with faster floating-point, and newer compilers with better floating-point scheduling...

I've spent a bit of time experimenting with wavetable oscillators lately, trying to come up with the fastest algorithm which meets certain constraints. I have some simple code results I'm happy to share, in the hope that others are also willing to share any improvements or suggestions.

I know everyone's requirements are a bit different, and I know that some pooh-pooh the very concept of wavetable oscillators. These oscillators are clearly nothing new; I know that wavetable oscillators are as old as the hills and everyone's got a favorite algorithm squirreled away somewhere. But it's because of their very prevalence that I'm interested in understanding the very best ways to code them. I proffer the results of a few midnight experiments as a starting-point for discussion. I want to thank Adrian for his helpful feedback on these ramblings, and also Gints Klimanis, for the code that served as a starting-point.

My constraints include: that the code should be written in C; that the wavetable be of arbitrary size (up to a reasonable maximum) and the loop points at arbitrary locations (many program restrict these to powers of two, but that'a unacceptable in general); that the oscillator use [at least] linear interpolation between sample-points; and that the wavetables should occupy a reasonable amount of memory.

The main variable in the architecture seems to be the choice of data-type for the phase accumulator (BTW, Adrian has a nice discussion of data type choices for efficient signal processing in C ). The choice of data-type should be governed by the set of operations performed, which in this case are: add, take the integer part of, take the fractional part of, and conditional subtract (to handle wrapping).

Fixed-point seems to have a number of advantages over floating-point for the phase accumulator. First, the addition tends to be a bit faster. Second, conversion to integer requires only a shift. Third, the conditional subtract can be performed *without branches* using logical operations (even for the non-power-of-two case; see the code for how to do this). The removal of the branch in the inner loop makes an enormous performance difference, for a number of reasons (neither pipelined processors, nor compiler optimizers, like branches -- avoid them in inner loops almost at all costs!) Finally, taking the fractional part tends to be similar in performance between floating-point and fixed-point. I keep saying "tends to be" because different processors yield slightly different results; I'm trying to classify trends.

The second choice of data-type is that of the wavetable itself. The signal math should be done in floating-point, since FP multiplies and multiply/adds are faster, and because of the superior dynamic range. So if the wavetable is kept as 16-bit integers, a number of additional int->fp conversions are required. If it is kept as float, no conversions are required, but the program goes through the cache twice as quickly, and requires twice as much memory.

Whether or not the cache costs outweigh the conversion savings depends upon how quickly the program processes a cache-line of data. This, in turn, depends upon the range of phase increments to be used. For relatively small phase increments, the amount of time spent fetching a cache-line is buried in the relatively large amount of time spent processing the cache-line. In these cases, the FP wavetable is faster. As the phase increment grows larger, however, above some threshold the FP wavetable oscillator becomes slower than the 16-bit integer wavetable oscillator, because it takes cache misses frequently enough to outweigh the saved int->fp conversion time. In the limit, however, as the phase increment continues to grow, the two oscillators are both taking a cache miss on every increment, and the FP oscillator is again faster, though not by as much. I benchmarked two oscillators, one using an FP wavetable, and one with a 16-bit wavetable, which were otherwise identical; the results seem to fit this model. Here are the results, with execution time, in seconds, on the Y axis, and phase increment, on a log scale, on the X axis.

I include two (nearly) functionally equivalent programs with different phase accumulators: one in double-precision, and one in 32-bit fixed-point. (I do have a 64-bit fixed-point version, but it requires some not-yet-released OS support to perform well; in performance it is similar to the 32-bit version). These are easy to modify and experiment with different variations on the algorithm. They're by no means full-featured; in fact, they're so stripped down as to be almost useless except as a means for experimenting. As they're set up , they don't actually make sound -- they just run for a certain number of iterations, and I time them. I don't do any envelopes or mixing -- just the raw oscillators.

I should note that I experimented quite a bit with the clever floating-point tricks Nick Thompson presented at the recent ICMC [Thomp95]. Either I misunderstood the best way to use these, or there are considerable architectural differences between the RS/6000 and the MIPS processors that make Nick's tricks actually *slower* than the regular double-precision oscillator (they still work, because they are dependent only upon IEEE floating-point support). If someone has different results on an R4K, I would really like to see them and understand what I might have done wrong.

Well, that's all for now. Hope this stuff is useful! Please send your feedback to cook@sgi.com .

[Thomp95] Thompson, N., and Dannenberg, R. "Optimizing Software Synthesis Performance," Proceedings of the 1995 International Computer Music Conference , p. 235.

Doug Scott
dscott@sgi.com