Sequencer Timing/Sync Issues

For music sequencing, the most important consideration is timing accuracy and resolution. After all, that's the purpose of a sequencer; to initiate musical events (as in sounding a note) at specific times. If the sequencer can't initiate an event exactly when you want, the sequencer is worthless. It will never be able to recreate a performance identical to the way that the musician (ie, you) would play a piece of music.

Needless to say, humans are capable of rendering an event at any given time. Although unlike a computer we don't have millisecond accuracy, we also aren't slaves to some "master clock rate" which never varies. Unfortunately, computers can't render an event at just any arbitrary time. They can only do so at some defined clock rate which must be less than infinite. This rate may be limited to the times when the computer is not polling the keyboard, refreshing the display, checking the MIDI IN for midi input, etc. It can only sound a midi note every so often. Hopefully, this rate is so fast that a human perceives it as infinite, but with low resolutions, this may not be the case.

By resolution, I mean the maximum number of clock pulses (ie, the smallest units of time) that can occur within a given span of time. Most sequencers specify resolution in terms of Pulses Per Quarter Note (PPQN). This tells you how many clock pulses are (ideally) in every quarter note. For example, a PPQN of 24 means that each quarter note (ideally) spans 24 clocks. Therefore, each 8th note would span 12 clocks since an 8th note is 1/2 as long as a quarter note, and each 16th note = 6. If you continue to divide it down, you'll see that such a seq can't render notes smaller than a 32nd note because it doesn't have enough resolution to yield an integer number of clock pulses. So, you can't record notes smaller than 32nd notes with such a seq.

Human feel versus Quantization

Note that I used the word "ideally". This reflects the fact that a computer is capable of playing quarter notes for hours with each one having exactly 24 clock pulses in it for example. No human being, no matter how accomplished, can duplicate this feat. At a tempo of 100 BPM, each clock pulse would be .025 secs long, and you would have to have exactly 24 of them per each quarter note. So, each quarter note would have to be held exactly .6 secs. Humans are not capable of repeating such small timing intervals with perfect accuracy. One quarter note might actually span 26 pulses, the next might work out to 22 pulses, etc. We can manage to fit 100 quarter notes into a minute, more or less (100 BPM), but we'll almost certainly not be accurate to a fraction of a second. We'll keep getting ahead of or behind the ideal clock interval for each note. This imperfection is commonly known as the human feel. Our brains are even designed to tune into this phenomenon, and reject anything that repeats itself without any variation. (In fact, this observed human characteristic is a fundamental principle of psycho-acoustics). So, the best way to make the human brain become bored by a particular musical passage is to cause the computer to adjust all of its events to perfect clock intervals (ie, If the seq's resolution is 24 PPQN, then all quarter notes span exactly 24 pulses, all 8th notes span 12 pulses, etc). Although your brain might not be able to recognize exactly what is going on, it will "know" that the performance is being rendered by a machine because the performance will not sound "human". The performance will have that artificially perfect timing.

Quantization is that notorious technique of "correcting" note timings to specific (ie, perfect) clock intervals. It can effect such things as rhythm, phrasing, and embellishments like trills, glissandos, vibrato, and a bunch of other stuff that I play all of the time but have long since forgotten their arbitrary, music theory terms. Musicians often use quantization to "correct" rhythmically sloppy playing, not realizing that this also removes the human nuances that our brain tunes into.

Clock resolution as it affects recording human inflections

Lets consider how a low clock resolution could be especially destructive to recording an accurate, human-sounding performance. Let's assume that a musician plays a two note chord. That's two midi Note On events which must be recorded by the sequencer. Humans are not machines with microsecond perfect accuracy. It's doubtful that a real musician would sound those 2 notes at the exact same instance of time even if he "pushed the notes down simultaneously". Human "simultaneous" is a lot different than electronic "simultaneous". We "hear the notes together" even though there are minute differences between when each sounded. Let's also assume that the time difference between the those two notes is less than the time difference between two clock pulses (ie, we have too low of a clock resolution to accurately record the musician's timing). Now, the seq has 2 choices; it can move the 2nd note to the next available clock pulse after the 1st note's clock pulse. Or, both events can be placed upon the same clock pulse. Either situation is going to change the "rhythm" of your performance. The first solution may make it sound as if you're a "spastic musician" (ie, your fingers aren't capable of depressing 2 keys together quick enough to play a "cleanly executed" chord). The 2nd solution gets rid of your human touch, yielding that humanly improbable, perfect timing. The real solution is that we need a clock resolution that is fine enough such that the time between two clock pulses is not greater than the time between our human timing of those events. We need a new clock pulse upon which to place our second event. That pulse will be somewhere inbetween the other two pulses.

Tests that Roland, one of the largest musical instrument manufacturers, conducted suggest that a minimum of 96 PPQN clock resolution is needed to capture most nuances of a human performance, and a resolution of at least 192 PPQN is ideal for capturing the most subtle human "irregularities" which clue our brain into the fact that a human is performing the music. In my own use of sequencers, I find that anything less than 192 PPQN is not adequate, and that 240 is ideal. I never quantize a performance because to do otherwise removes the human nuances. Note that a few sequencers allow a form of "half-quantisation" which only corrects the most "rhythmically off" events and then only by a random or partial amount. This is perfectly acceptable since it doesn't "wipe out" the subtle human element, but rather, corrects the most grotesque mistakes (and not in a computer perfect manner). In fact, I use this feature all of the time on my sequencer program.

Timing Accuracy

A high resolution is a must for sequencing a human-sounding performance. So, the higher PPQN number means "better"? Not necessarily. A high resolution is good only when the sequencer can maintain and use such a steady and fast clock rate. It should be noted that it takes a finite amount of time for two midi events to be transmitted at MIDI baud rate. A clock resolution that allows two notes to be spaced at times less than the amount of time required to send them out the midi port is useless. Some sequencer programs boast resolutions of 480 PPQN or more. Are these resolutions useful? Yes, but not in a musical sense. They are only useful for hooking impressionable software buyers. It's especially ironic that these programs should boast such high rates since, when run on the personal computers they are typically used with, these programs are unable to keep up with these rates at many tempos. The hardware timer furiously counts down while the computer's CPU struggles to refresh the display and do a host of other duties while it should be outputting MIDI data. It's not uncommon for midi events to be output on the wrong clock pulse because the computer missed outputting it on some previous, correct pulse. I have found these "high resolution" sequencer programs to be very unstable when it comes to timing accuracy. They have this "funky timing feel" which isn't random enough to sound human. It just sounds annoying. You can certainly buy a computer that runs fast enough to maintain such high clock resolutions, but if you saddle it with an OS that has a lot of overhead, or even worse, runs the sequencer software under a non-native environment, your CPU power ends up being used mostly for non-MIDI purposes. And, what's the point given MIDI bandwidth? The software would still be no better than a sequencer with a more realistic resolution, unless you went with multiple MIDI busses and managed to diverge your MIDI data evenly between the outputs. So, for typical computer systems, I recommend avoiding sequencer software that cranks up its timer to an unmanageable rate that actually causes more problems with timing than it solves.

Furthermore, unless a seq can maintain its PPQN resolution throughout its usable tempo range, a PPQN claim is meaningless. Unfortunately, most ads omit this info, and so one is left to assume that this rate is accurate throughout a tempo range, which is not always a wise assumption. If using a seq that allows you to vary its clock resolution, you may actually find that you get more solid timing at fast tempos if you lower the resolution. This is due to the fact that, with a high resolution at a fast tempo, the seq may be getting into that situation described above (ie, the resolution vastly exceeds MIDI bandwidth, and so you're just wasting CPU cycles maintaining a fast counter that could be better used for other, more effective purposes). And on slower tempos, you could raise the clock resolution in order to avoid that low resolution example described in the preceding section.

Sequencer Sync

As any designer can tell you, it's more efficient to solve a problem in hardware, than it is to use a software implementation. On the other hand, software usually can be more flexible and easily upgraded. What I look for in a software sequencer is a program that can tap into some very useful hardware schemes. For example, the only way to eliminate MIDI bandwidth today is to use a MIDI interface with multiple busses (or several interfaces). I prefer sequencers that support such. (ie, The sequencer should be able to support outputting various tracks to different MIDI devices/outputs). I also prefer sequencers that can utilize any special features of the hardware that I use. For example, if using some hardware that has tape sync implemented in hardware, I prefer a program that utilizes that scheme rather than working around it with some sort of software scheme that is bound to be less efficient.

Up to now, we've concerned ourselves only with the computer supplying a clock pulse internally. What happens when a sequencer must be synced to an outside source like magnetic tape or digital audio recorders? For example, you may want to combine recorded vocals or acoustic instruments with midi'ed tracks, or sync to a video picture.

There are several types of sync signals. The worst is via MIDI clock. This has an appalling resolution of 24 PPQN, plus it uses up midi bandwidth (ie, uses up some of the time that could be used to send other midi messages like NOTE ONs, perhaps causing rhythmic irregularities and syncing "glitches"). It tracks tempo changes very poorly. You can't record MIDI CLOCKS directly onto magnetic tape. You need some device that records a different signal on tape (i.e. SMPTE or FSK), and then on playback, translates it to MIDI CLOCKS. Of course, having to recreate the sync "on the fly" wastes valuable time that simply aggravates timing problems.

Another sync protocol is MIDI Time Code (MTC). This also uses up midi bandwidth, causing the same problems as MIDI CLOCKS. Although it can have a much higher timing resolution, it's still a hack that can't be recorded to magnetic tape, and must therefore be created upon playback.

One of the most popular syncs is SMPTE TIME CODE. There are 2 types; VITC and LTC. LTC can be recorded onto magnetic tape (VITC is used for videotape, and there are devices which convert one to the other). It doesn't go over the midi bus so it doesn't use up valuable midi bandwidth. It was designed for video work, but musicians use it because it is so common, and also because it is mandatory when syncing to any kind of video source. SMPTE has a very coarse resolution. There is 1/30th of a second between SMPTE Frames. This translates to 33 msecs, an amount of time which can be clearly perceived by humans. A SMPTE unit for music use must be capable of outputting SubFrames (which yield a much finer resolution). Its main drawback for music is that it is a steady clock rate. It doesn't take musical tempo into consideration. So, a seq must maintain its own clock to count off PPQN whose value is "checked against" the incoming SMPTE timing. This may take up valuable processor time if done in software (like most implementations do), and once again, is not ideal for tracking tempo changes. Tempo changes would only be accurate on frames, not subframes, because subframes are not recorded onto the tape. The format for subframes has never been standardized, so some SMPTE equipment may not match up with other SMPTE equipment. Nevertheless, this scheme is vastly superior to MTC or MIDI CLOCKS. Its ability to recover from errors in the sync pulse is excellent, which MIDI CLOCKS cannot do, and MTC does poorly. When syncing to digital audio recorders, SMPTE may be your only choice.

The last form of sync is FSK (Frequency Shift Keying). Once popular, it is hardly ever used today since magnetic tape has been supplanted by digital recorders. It doesn't use up midi bandwidth. It can be recorded to magnetic tape (but isn't as useful for video work, or syncing to digital audio gear). It's (usually) a variable rate based upon tempo, so no interpolation is needed by the sequencer (ie, the seq can use the incoming sync directly for its PPQN count), and therefore tracks tempo changes the best of any sync. A variation of this is called "Smart FSK" (like on some MusicQuest MQX interfaces). Normally, FSK doesn't allow you to start the tape in the middle of a song and have the seq immediately advance to that section of the performance in sync (like SMPTE does). (This is known as "chasing"). Smart FSK has a facility for doing this. Smart FSK is the most stable and accurate form of sync for audio work, even though its immunity to random errors is not as good as SMPTE's. Unfortunately, the frequencies for FSK were never standardized either, so some units will not read another manufacturers' FSK properly. I personally prefer Smart FSK for audio only work, but unfortunately, manufacturers didn't coordinate (ie, standardize) and promote its use enough, so today, SMPTE is much more widely supported and used.

MIDI Bandwidth's affect upon timing

OK, what else effects a sequencer's timing, and therefore it's performance? When large amounts of midi data need to be output simultaneously (ie, an 8 note chord plus some drum events, a bass line, etc), then you may get what's referred to as "Midi Data Clogging". All of those events don't actually get output simultaneously. They must be output one at a time, because MIDI is a serial transmission. It takes a certain amount of time to output a midi event. Normally, this is so fast to the human ear, that we hear "bursts" of several midi events as simultaneous events (ie, the time between midi events is less than the time that the human ear needs to detect the delay). Unfortunately, if there are numerous events to be output, it's possible that the last event will be far enough behind the first that a human will perceive it as NOT being simultaneous. This situation is described as "Midi Data Clogging". What is the remedy? You need multiple MIDI outputs so that you can send midi events for different devices on their own cables. No longer do the events for ALL midi devices need to "fight over" access to the one output cable.

Human feel versus Quantization

Clock resolution as it affects recording human inflections

Timing Accuracy

Sequencer Sync

MIDI Bandwidth's affect upon timing

Recommended reading: