Digital Audio on a computer

With the advent of computer audio cards that can record and playback digital audio, computer software has appeared which turns a computer into a device that can not only record and playback that digital audio, but also present that data in a way that makes it easy for musicians to view and edit it. The data can be graphically displayed upon a computer monitor, and can be manipulated with the mouse, for example. Furthermore, computer software can also perform its own manipulations of that data, yielding effects such as delay, transposition, chorus, compression, etc, sometimes even in real-time (ie, while the audio data is playing back). The wide range of computer audio products also means that a computer digital audio system can be tailored to many budgets. And being that computers are typically more easily upgradable than dedicated digital audio units (for example, adding a second hard drive to accomodate more digital audio tracks), and can do other things besides digital audio work, they are often ultimately more versatile and cost effective than dedicated digital audio units. In short, with good audio hardware and software, computers make very good digital audio workstations.

Examples of software that supports both digital audio recording, as well as MIDI, are CakeWalk Pro Audio, PG Music's Power Tracks, Steinberg's Cubase, etc. Examples of programs that specialize in digital audio recording (and may have a more powerful and easier feature set for digital audio editing than the sequencers) are Cool Edit, Sound Forge, SAW, Samplitude, etc.

Digital Audio Recording

A typical computer system works with digital audio in the following way. First, to record digital audio, you need a card with an Analog to Digital Converter (ADC) circuitry on it. This ADC is attached to the Line In (and Mic In) jack of your audio card, and converts the incoming analog audio to a digital signal that computer software can store on your hard drive, visually display on the computer's monitor, mathematically manipulate in order to add effects or process the sound, etc. (When I say "incoming analog audio", I'm referring to whatever you're pumping into the Line In or Mic In of your sound card, for example, the output from a mixing console, or the audio output of an electronic instrument, or the sound of some acoustic instrument or voice being feed through a microphone plugged into the sound card's Mic In, etc). While the incoming analog audio is being recorded, the ADC is creating many, many digital values in its conversion to a digital audio representation of what is being recorded. Think of it as analogous to a cassette recorder. While you're recording some analog audio to a cassette tape, the tape is constantly passing over the record head. So the longer the passage of music you record, the more cassette tape you use to record that analog audio signal onto the tape. So too with the conversion to digital audio. The longer the passage of music you record (ie, digitize), the more digital values are created, and these values must be stored for later playback.

Where are these digital values stored? Well, as your sound card creates each value, that data is passed to the card's software driver which then passes that data to the software program managing the recording process. Such software might be CakeWalk recording a digital audio track, or SAW, or Samplitude, or Windows Sound Recorder, or any other program capable of initiating and managing the recording of digital audio. Whereas the program may temporarily accumulate those digital audio values in the computer's RAM, those values will eventually have to be stored upon some fixed medium for permanence. That medium is your computer's hard drive. (For this reason, sometimes people refer to the process of recording digital audio to a hard drive as "Hard Disk Recording". Henceforth, I will abbreviate "hard drive" as HD). Usually, how this works is that the program accumulates a "block" of data in RAM (while the sound card is digitizing the incoming analog audio), for example 4,000 data values, and then writes these 4,000 values into a file on your HD. (It's a lot more efficient to write 4,000 values at once to your hard drive, than it is to write those 4,000 values one after the other separately). All of these 4,000 values go into one file. Usually, the format for how the data is arranged within this file follows the WAVE file format. (But there are other formats that may also be used by programs to store digital audio values. For example, AIFF is often used on the Macintosh. AU format is used on Sun computers. MP3 is a popular format nowadays because it compresses the data's size. Etc. Any of these formats could be used on any computer, but WAVE is considered the standard on a Windows-based PC). Now, if the recording process is still going on, the software program will collect 4,000 more values in RAM, and then write them out to the same WAVE file on the HD (without erasing the previously stored 4,000 values -- ie, the values accumulate within the WAVE file, so that it now has 8,000 values in it). This process continues until the musician halts the recording process. ( ie, If you let recording continue long enough, it will eventually fill up your HD with digital audio values in one, big WAVE file). At that point, the WAVE file is complete, and contains all of the digital audio values representing the analog audio recorded.

So, digital audio recorded by most PC programs is predominantly stored in a WAVE file on your HD, and that file is created while the digital audio is being created/recorded.

Digital Audio Playback

In order to subsequently playback this digital audio (ie, WAVE file), you need a card with a Digital To Analog Converter (DAC) circuitry on it. Needless to say, most sound cards have both an ADC and a DAC so that the card can both record and play digital audio. This DAC is attached to the Line Out jack of your audio card, and converts the digital audio values back into the original analog audio (that was initially recorded during the recording process). This analog audio can then be routed to a mixer, or speakers, or headphones so that you can hear the recreation of what was originally recorded. You need software to manage the playback of digital audio, and not surprisingly, the same program that was used to manage the recording process usually can also manage the playback. For example, CakeWalk can playback the digital audio track that it recorded. The playback process is almost an exact reverse of the recording process. The program reads a block of digital audio data from the WAVE file on the HD. For example, CalkWalk may read the first 4,000 data values. (It's more efficient to read 4,000 values off of the HD at once, than it is to read those 4,000 values one after the other separately). Then, the program passes each one of these values to the card's driver which feeds it to the card's DAC. The program then reads the next 4,000 values from the WAVE file, and plays those back as described. In other words, the sound card is recreating the original analog audio while the program is reading the digital audio values off of the HD and passing them back to the card. This continues until the program has played all of the values in the WAVE file (or until the musician interrupts the playback).

Data processing during playback

Some programs can optionally perform some mathematical manipulation of the digital audio values immediately before the data is passed to the card's driver (ie, during playback). Such manipulations may be to add effects such as reverb, chorus, delay, etc. Programs that do such realtime processing are SAW and Samplitude.

Some programs also can process the digital audio to do some valuable things such as "time-stretching" and "pitch shift" (although these are often too computationally complex for today's computer to do during playback. So, this processing usually has to be applied before playback, and may be "destructive" in that it permanently alters the digital audio data).

Time-stretching: When you play a one-shot (ie, non-looped) waveform, it lasts only so long (ie, in terms of time). For example, maybe you've got digital audio tracks of a piece of music whose duration is 2 minutes. Sometimes, people need to adjust the length of time over which the waveform plays. For example, maybe the producer of a movie says "I want this piece of music to last exactly 2 minutes and 3 seconds in order to fit with this filmed scene I have which is this long". Well, if you had a MIDI track, you'd just slow the tempo a little in order to make the music last that extra 3 seconds. The music would sound exactly the same, but at a slightly slower tempo. OK, so how do you "slow down" the digital audio tracks? Well, you could reduce the playback rate a little. But, if you've ever used a sampler, you'll notice that when you play a waveform at a rate different than it was recorded, this changes the character of the waveform itself. You don't just hear a different tempo, you hear different pitch, vibrato, tremulo, tone, etc. So, time-stretching was devised to take a waveform, analyze it, and change its length without (hopefully) changing its characteristics. The net result is that you get the same effect that you had by changing the tempo of the MIDI track; ie, merely a change in tempo/duration rather than a change in the characteristics of the waveform. (Nevertheless, there is always some potential for a change in timbre when time-stretching algorithms are applied to a waveform).

Pitch shift: This changes the note's pitch without altering the playback rate (which would alter other characteristics of the waveform). So, if you sampled the middle C note of a piano at 44KHz, but when you play it back at 44KHz, you really want to hear a D note, you could apply pitch shift to it to create a waveform whose "root" (ie, the pitch you get when you playback at the same sample rate as when recorded) is D. If you take a musical performance with "rhythms" in it, and apply pitch shift, you should get a different pitch, but retain the same rhythms (ie, tempo).

Virtual Tracks

Although most sound cards have only 2 discrete digital audio channels (ie, stereo digital audio capabilities), many programs such as CakeWalk support recording and playing many more tracks of digital audio. And yet, CakeWalk seemingly plays more than 2 digital audio tracks using such a card. How is this done? It is accomplished through the use of "virtual tracks". What this means is that the program allows you to record as many digital audio tracks as you like, mono and/or stereo. The only limitation is your HD space, plus how many tracks can be read off of your HD at the required rate of playback. (You can record only 2 mono tracks, or 1 stereo track at one time, due to the card's limit of only 2 channels. But, you can do as many iterations of the recording process as you wish to yield many more than 2 tracks. It's on playback that the concept of virtual tracks works). For example, you could record 4 mono tracks, plus 3 stereo tracks (if you had a fast enough HD -- doubtful on slow IDE stuff). You can set the individual panning for each of the mono tracks. Then, upon playback (ie, when CakeWalk reads the data for all of those digital audio tracks from their WAVE files), CakeWalk itself mathematically mixes all of the digital audio tracks into one stereo digital audio mix, and outputs this mix to the sound card's stereo DAC. In other words, CakeWalk itself functions as a sort of "digital mixer", mixing many mono and stereo tracks together (during playback) into one stereo digital audio track. So, using any sound card with a stereo digital audio DAC, you can actually record as many tracks as you like. CakeWalk virtualizes the card so that it appears to have many digital audio tracks, mono and/or stereo. Most pro programs that work with digital audio support the concept of "virtual tracks".

There are a few drawbacks to this scheme though. First, the more tracks that you record, the more that you have to back off on the individual volume of each track. Why? Because when all of the tracks are mathematically summed together by CakeWalk, if at any point the sum exceeds a 16-bit value, you'll get clipping. The more tracks that you sum, the more likely you are to get clipping -- unless you back off on the individual track volumes more with each added track. It doesn't matter if your card has 18-bit (ie, Tahiti) or 20-bit (ie, Pinnacle) DACs. CakeWalk itself performs the sum into a 16-bit mix, so there is an inherent 16-bit limitation to CakeWalk's output (or any other program that is limited to 16-bit digital audio -- some digital audio programs offer greater than 16-bit resolution for their internal mixing, such as 24-bit or 32-bit. These programs don't require you to back off on the individual wave volumes so much. If you're using a lot of virtual tracks, use a program with at least 24-bit resolution for its internal mixing). So the more tracks you record, the less dynamic range you get for each individual track as you turn down the individual volume to avoid clipping the mix. I doubt that you'd want to mix more than 4 mono virtual tracks to one card. (On the plus side, if you get more digital audio cards, you can then split up virtual tracks among them since CakeWalk can use more than 1 card simultaneously. But remember that most ISA cards require at least 1 DMA channel for playback, and 1 for full-duplex recording, and a PC has only 3 16-bit DMA channels, so you're pretty much limited to 2 ISA sound cards -- unless you get a Tahiti, Fiji, or Pinnacle which use a proprietary, non-DMA method of output, or a PCI card as those do not use motherboard DMA). Of course, you could get one of those cards with 8 digital audio channels like a DAL V8, Antex SoundCard, EMagic Audiowerk8, or DigiDesign Session 8, which usually have their own accompanying program to support better than 16-bit DACs and perhaps better throughput.

Secondly, you may lose some of the individual hardware control of the card when CakeWalk takes over the task of digitally mixing the output. For example, CakeWalk has to operate a Roland RAP-10 in its stereo mode, so you lose the RAP's individual control over each track's reverb/delay and chorus (effective when the RAP-10 channels are used as 2 Mono channels). In other words, most programs take a generic approach to virtual tracks which may not support some of the more esoteric functions of certain cards. So, it's important that if you get a card that has something more than just a simple, stereo 16-bit DAC, you make sure that you have software support for that additional functionality lest you want it to be wasted by a program that treats your card as if it were a simple, stereo 16-bit DAC.

Hard Drive requirements

With digital audio tracks recorded to a HD, usually the limiting factor in how many virtual tracks can be simultaneously played, and how well the audio sounds, is the speed at which data can be read/written to your HD. A slow HD (more than 12ms access time) is usually the limiting factor in how many virtual tracks can be used. Also, you need a HD that has no problems with thermal recalibration (ie, lengthy adjustments the HD may make during reading/writing that can cause a delay in accessing the drive). These delays may cause a program to fail to feed a continuous stream of digital audio values to/from the sound card, and you'll then hear "glitches" in the audio. Newer HD designs have tended to minimize this problem. Select a good HD for digital audio. It may be even more critical than differences in sound card designs. If you go with an EIDE HD, make sure that you get one that supports bus mastering (in Mode 4) or at least DMA/33, and you use it with a motherboard that has a PCI EIDE controller and drivers that support EIDE bus mastering. (Win95's Service Pack 2 added such support to Win95). Otherwise, SCSI has an advantage in that most SCSI controllers support bus mastering, scatter-gather lists, and other features that make for efficient HD I/O, and which are supported by most operating systems' drivers.

Furthermore, since digital audio requires a constant stream of values throughout the recording process (unlike with MIDI), you need a large HD if you want to record long digital audio tracks. You'll use up 5 MEG of HD space for every minute of a 16-bit mono digital audio track recorded at 44.1Khz (and 10 MEG for a stereo track). In other words, recording a CD-quality digital stereo track that is 5 minutes long will use up 50 MEG of your HD.

Audio card requirements

Of course, this is not to say that the quality of your sound card isn't important too. If you've got a card with a cheap DAC or ADC, you're going to get noisy digital audio. (ie, The result will typically sound like a "graininess" to the audio or even "tape hiss". On the other hand, problems with the HD usually manifest in horrid distortion or weird noises such as pops and clicks). To get nicely recorded digital audio, and as many simultaneous virtual tracks as possible, you'll want both a fast, large HD with efficient controller I/O, and a card with a good DAC and ADC.

Unless you've got a card that has digital I/O so that you can run a digital connection right to a DAT deck, for example, you'll also want a card that has a clean (ie, low Thd distortion) and quiet (ie, low signal-to-noise ratio) audio output stage.

For more information about digital audio cards themselves, see Digital Audio Cards.