Art of Softsynth Development: Avoid batch processing (if you can)

Aug 09, 2013

Synthesizer sound generation involves multiple stages: e.g. oscillator generation, filtering, enveloping, effects. Usually we group samples so each stage runs separately on a number of sequential samples (a batch) before we move on to the next stage. For instance, we might have a fixed block size of 16 samples. That means that for the current block of 16 samples, we first generate all 16 oscillator samples; then filter all 16 samples; then envelope all 16 samples; and then apply effects to all 16 samples. This is referred to as batch (or block) processing.

The reason for batching computations is better performance. A couple of factors contribute to that; the primary one being cache efficiency in both instruction and data caches.

For the instruction cache, it is inefficient to run through all the computations needed to generate a single sample in one go, because the amount of instructions needed to support those computations is large. This means that the instruction cache goes completely unexploited. It is much more efficient to separate the computations into stages, and run those stages for multiple samples at the same time. This exploits the instruction cache much better because instruction loading is amortized over multiple samples.

The same basic argument applies for the data caching. The sample blocks are allocated sequentially in memory, which is very cache efficient: once you’ve loaded the first sample in the block, it costs very little to access subsequent ones.

For these reasons, batch processing is almost completely ubiquitous in softsynth programming. It is how everyone does their computations, and for good reasons.

Problem?

Batch processing is great for performance, but complicates other parts of your synth, particularly in event handling and modulation updating.

Events

Event handling is mostly about note on and note off events. The issue here is that grouping your sample processing into batches means that you cannot introduce events in the middle of a batch. That results in a loss of precision in the timing of your note events, creating a jittering effect where the note time is noticeably imprecise.

Whether this will actually be discernible or not depends on a few factors. It seems obvious that the amount of jitter will be proportional to the size of your batches. The second factor is the tempo of the note events; the jitter will be easier to hear if the notes are coming fast. From personal experience I’d say that block sizes of 4 are fine, but block sizes of 128 are probably not.

Modulation

Modulation is primarily about LFOs and envelopes - they are computed internally in the synth, and continually influence parameters.

Modulation is another reason why batch processing performs better: Modulating parameters means that you must recompute internal variables, a type of computation that is often a bit costly, especially for filter coefficients where it involves transcendental functions and lots of math. Solely for this reason, batch processing is often attractive.

However, if modulated parameters aren’t updated every sample, that will change the sound. In the extreme example, imagine if an LFO modulated filter cutoff was only updated once a second. Of course the lack of updates will influence the sound. You might also consider interpolation between actual updates.

So you need a modulation update strategy; and your batch processing strategy and modulation update strategy are highly intertwined, so you need to have both in mind when designing this part of your synth.

Strategies

There are many valid strategies for batching + modulation + event handling. Let me try to enumerate the most common ones:

No batching (per sample updates). Updating events and modulation every sample is by far the simplest strategy, since it completely sidesteps the issue. Modulation and events will automatically work. However, your synth will be slower. Maybe twice as slow, maybe more.
Laissez-faire batching: Leave the strategizing to the calling code. In a VSTi context, this would be the VST host. The calling code tells the synth how many samples to generate. At the start of each batch, the synth handles events and updates the parameter modulation, and then naively generates samples to fill the block without further consideration for jittering and modulation update frequencies.
Fixed batch size: Fixing the batch size at a relatively low number of samples gives you the ability to make some guarantees about jittering and modulation update frequencies, while only updating every n samples.
Adaptive batch size: Try to figure out algorithmically how long each block should optimally be. Batch size will mostly be limited by events, but some modulation considerations would be fast changes during the attack phase of an envelope, a high frequency LFO, or a discontinuous LFO about to make a jump should all shorten the block size.

Interpolation

Beyond choosing a batch size strategy, you could also choose to interpolate your parameter modulation. This can be a useful, although it is probably overkill if you apply it to everything.

There are two options for what to interpolate. You could interpolate the modulated parameters themselves, or you could modulate the derived number used in actual calculations. For instance, for a modulated filter, you could either interpolate the modulated cutoff and resonance, or you could interpolate the filter coefficients which are derived from the cutoff and resonance, and which are used in the actual computations.

The derivation of filter coefficients from the cutoff frequency might be an expensive computation, so it is much more efficient to interpolate the coefficients directly. But this comes with it’s own set of problems, since interpolating coefficient will not necessarily give you a filter with the qualities you’re after in the intermediate steps. It might even make your filter unstable, especially if your block size is long.

The other thing you might consider is interpolation methods. I don’t think there is any point in going beyond linear interpolation, unless your batch size is very big.

Statelessness to the rescue, again

There are various ways to make interpolation happen, but the simplest way of doing it is by keeping your modulations stateless. We saw the principle of statelessness at work in The Stateless Envelope. In that article we saw the advantages of expressing the ADSR envelope as a function of time alone, with no additional state. You can express your modulations the same way. For instance, it is trivial to make a sine LFO stateless, just express it as sin(time) rather than explicitly keeping track of the phase. Other modulations might be trickier to make stateless, and it can be especially hard to retrofit statelessness in a complicated system. Statelessness can compute everything statefulness can though.

If your modulations and envelopes are stateless, it’s trivial to do interpolation, since it is easy to evaluate modulation at different time points.

Recommendations

If you can afford it, stick to per-sample modulation updates and event handling. This is a really great simplification. 4k synths work this way, because it is smaller, and they can afford to make the performance tradeoff (they don’t really have a choice). Some commercial synths are also implemented this way, most often because the way they work necessitates updating per sample.

But for most of other synths, batch processing is a reality. Considering Moores Law, this is actually a tradeoff that we will see synth developers be less and less willing to make in a few years. We’re almost at the point were per-sample updates are the best way to go, performance issues be damned.

The second most usable strategy is fixed size batches. 16 samples per block is reasonable. At 44.1kHz samplerate, that gives you an update frequency of around 3kHz, which is probably plenty. You’ll have one problem though: anything that directly modulates the amplitude of the signal (like your ADSR envelope) will give a noise band around 3kHz, and it’s very audible when the synth is outputting otherwise clean sounds. The fix in this case is interpolation. Since your ADSR envelope is already stateless, that shouldn’t be an issue, just do linear interpolation over your block.

Fixed size blocks might also have a slight performance advantage, because they leave the compiler with more information to optimize from. If the block size is known, the compiler might unroll a loop, or be able to reduce some computations to constant expressions.

Laissez-faire batching really isn’t in a demoscene context. You’d still have to have the logic somewhere, so given the lack of guarantees about jittering or modulation update frequency, let’s forget about this one for now, although it might be a usable design decision to not let the synth itself worry about batch sizes.

Adaptive block sizes are interesting, and can be quite attractive for performance reasons, since you actually only rarely need a high update frequency. Unfortunately the logic to support it is complicated. Your willingness to trade performance for complicated logic needs to very high for this be viable.

In summary: Go without batch processing if you can. Otherwise go with a (low) fixed block size. If you have special needs, consider other options. Use linear interpolation where necessary, for instance for the ADSR amplitude envelope.

Bonus: Fixing jittering

If you are concerned about jittering, there’s one more thing we can do. If you’ve implemented a VST instrument, you might be aware of the deltaFrames parameter given with events. This parameter indicates how many samples into the next batch the event actually occurs - that is, it given you sample accurate timing for an event.

It can be hard to use this information effectively, so most people don’t use it directly (but use a smaller fixed block size instead, which is a lot simpler). But it’s not impossible to do it correctly, so let me give you a few pointers.

First and foremost, you have to fix the ADSR envelopes, so that the voice amplitude is sample correct. This is the most important step. Fortunately, this is easy to do if your envelopes are stateless. Keep track of deltaFrames, and apply it as a time offset when you get values from the envelope.

The second most important thing to fix is other modulation, e.g. LFOs. If your modulation is stateless, this is easily fixable using the same time offset technique as for the envelope. If not, you’re in trouble.

The third problem to think about is oscillator phase. Oscillator phase might matter, especially in the attack phase of the sound. For this to be audible, you need to have the correct combination of envelope, waveform and pitch. You might just conclude that this doesn’t matter, and forget about keeping correct oscillator phase. That is your first option.

You second option is making the phase stateless. But this defeats the very idea of performance wins by batching. In practice, the performance loss might not be all that bad though. Try it.

Your third option is keeping the computation stateful, and somehow special case the note-on-in-block situation. You could forgo updating the phase until the note is actually on, or you could try somehow working backwards to arrive at a set of states that will lead to the correct starting state when the note is actually on. That would probably be hard though.

Those are your options. Be careful out there!