DigitalDigital

Audio Recording

A typical computer system works with digital audio in the following way. First, to record digital audio, you need a card with an Analog to Digital Converter (ADC) circuitry on it. This ADC is attached to the Line In (and Mic In) jack of your audio card, and converts the incoming analog audio to a digital signal that computer software can store on your hard drive, visually display on the computer's monitor, mathematically manipulate in order to add effects or process the sound, etc. (When I say "incoming analog audio", I'm referring to whatever you're pumping into the Line In or Mic In of your sound card, for example, the output from a mixing console, or the audio output of an electronic instrument, or the sound of some acoustic instrument or voice being feed through a microphone plugged into the sound card's Mic In, etc). While the incoming analog audio is being recorded, the ADC is creating many, many digital values in its conversion to a digital audio representation of what is being recorded. Think of it as analogous to a cassette recorder. While you're recording some analog audio to a cassette tape, the tape is constantly passing over the record head. So the longer the passage of music you record, the more cassette tape you use to record that analog audio signal onto the tape. So too with the conversion to digital audio. The longer the passage of music you record (ie, digitize), the more digital values are created, and these values must be stored for later playback.

Where are these digital values stored? Well, as your sound card creates each value, that data is passed to the card's software driver which then passes that data to the software program managing the recording process. Such software might be CakeWalk recording a digital audio track, or SAW, or Samplitude, or Windows Sound Recorder, or any other program capable of initiating and managing the recording of digital audio. Whereas the program may temporarily accumulate those digital audio values in the computer's RAM, those values will eventually have to be stored upon some fixed medium for permanence. That medium is your computer's hard drive. (For this reason, sometimes people refer to the process of recording digital audio to a hard drive as "Hard Disk Recording". Henceforth, I will abbreviate "hard drive" as HD). Usually, how this works is that the program accumulates a "block" of data in RAM (while the sound card is digitizing the incoming analog audio), for example 4,000 data values, and then writes these 4,000 values into a file on your HD. (It's a lot more efficient to write 4,000 values at once to your hard drive, than it is to write those 4,000 values one after the other separately). All of these 4,000 values go into one file. Usually, the format for how the data is arranged within this file follows the WAVE file format. (But there are other formats that may also be used by programs to store digital audio values. For example, AIFF is often used on the Macintosh. AU format is used on Sun computers. MP3 is a popular format nowadays because it compresses the data's size. Etc. Any of these formats could be used on any computer, but WAVE is considered the standard on a Windows-based PC). Now, if the recording process is still going on, the software program will collect 4,000 more values in RAM, and then write them out to the same WAVE file on the HD (without erasing the previously stored 4,000 values -- ie, the values accumulate within the WAVE file, so that it now has 8,000 values in it). This process continues until the musician halts the recording process. ( ie, If you let recording continue long enough, it will eventually fill up your HD with digital audio values in one, big WAVE file). At that point, the WAVE file is complete, and contains all of the digital audio values representing the analog audio recorded.

So, digital audio recorded by most PC programs is predominantly stored in a WAVE file on your HD, and that file is created while the digital audio is being created/recorded.

Audio Playback

Some programs can optionally perform some mathematical manipulation of the digital audio values immediately before the data is passed to the card's driver (ie, during playback). Such manipulations may be to add effects such as reverb, chorus, delay, etc. Programs that do such realtime processing are SAW and Samplitude.

Some programs also can process the digital audio to do some valuable things such as "time-stretching" and "pitch shift" (although these are often too computationally complex for today's computer to do during playback. So, this processing usually has to be applied before playback, and may be "destructive" in that it permanently alters the digital audio data). When you play a one-shot (ie, non-looped) waveform, it lasts only so long (ie, in terms of time). For example, maybe you've got digital audio tracks of a piece of music whose duration is 2 minutes. Sometimes, people need to adjust the length of time over which the waveform plays. For example, maybe the producer of a movie says "I want this piece of music to last exactly 2 minutes and 3 seconds in order to fit with this filmed scene I have which is this long". Well, if you had a MIDI track, you'd just slow the tempo a little in order to make the music last that extra 3 seconds. The music would sound exactly the same, but at a slightly slower tempo. OK, so how do you "slow down" the digital audio tracks? Well, you could reduce the playback rate a little. But, if you've ever used a sampler, you'll notice that when you play a waveform at a rate different than it was recorded, this changes the character of the waveform itself. You don't just hear a different tempo, you hear different pitch, vibrato, tremulo, tone, etc. So, time-stretching was devised to take a waveform, analyze it, and change its length without (hopefully) changing its characteristics. The net result is that you get the same effect that you had by changing the tempo of the MIDI track; ie, merely a change in tempo/duration rather than a change in the characteristics of the waveform. (Nevertheless, there is always some potential for a change in timbre when time-stretching algorithms are applied to a waveform).

Time-stretching: When you play a one-shot (ie, non-looped) waveform, it lasts only so long (ie, in terms of time). For example, maybe you've got digital audio tracks of a piece of music whose duration is 2 minutes. Sometimes, people need to adjust the length of time over which the waveform plays. For example, maybe the producer of a movie says "I want this piece of music to last exactly 2 minutes and 3 seconds in order to fit with this filmed scene I have which is this long". Well, if you had a MIDI track, you'd just slow the tempo a little in order to make the music last that extra 3 seconds. The music would sound exactly the same, but at a slightly slower tempo. OK, so how do you "slow down" the digital audio tracks? Well, you could reduce the playback rate a little. But, if you've ever used a sampler, you'll notice that when you play a waveform at a rate different than it was recorded, this changes the character of the waveform itself. You don't just hear a different tempo, you hear different pitch, vibrato, tremulo, tone, etc. So, time-stretching was devised to take a waveform, analyze it, and change its length without (hopefully) changing its characteristics. The net result is that you get the same effect that you had by changing the tempo of the MIDI track; ie, merely a change in tempo/duration rather than a change in the characteristics of the waveform. (Nevertheless, there is always some potential for a change in timbre when time-stretching algorithms are applied to a waveform).

Pitch shift: This changes the note's pitch without altering the playback rate (which would alter other characteristics of the waveform). So, if you sampled the middle C note of a piano at 44KHz, but when you play it back at 44KHz, you really want to hear a D note, you could apply pitch shift to it to create a waveform whose "root" (ie, the pitch you get when you playback at the same sample rate as when recorded) is D. If you take a musical performance with "rhythms" in it, and apply pitch shift, you should get a different pitch, but retain the same rhythms (ie, tempo).

Virtual Tracks

Although most sound cards have only 2 discrete digital audio channels (ie, stereo digital audio capabilities), many programs such as CakeWalk support recording and playing many more tracks of digital audio. And yet, CakeWalk seemingly plays more than 2 digital audio tracks using such a card. How is this done? It is accomplished through the use of "virtual tracks". What this means is that the program allows you to record as many digital audio tracks as you like, mono and/or stereo. The only limitation is your HD space, plus how many tracks can be read off of your HD at the required rate of playback. (You can record only 2 mono tracks, or 1 stereo track at one time, due to the card's limit of only 2 channels. But, you can do as many iterations of the recording process as you wish to yield many more than 2 tracks. It's on playback that the concept of virtual tracks works). For example, you could record 4 mono tracks, plus 3 stereo tracks (if you had a fast enough HD -- doubtful on slow IDE stuff). You can set the individual panning for each of the mono tracks. Then, upon playback (ie, when CakeWalk reads the data for all of those digital audio tracks from their WAVE files), CakeWalk itself mathematically mixes all of the digital audio tracks into one stereo digital audio mix, and outputs this mix to the sound card's stereo DAC. In other words, CakeWalk itself functions as a sort of "digital mixer", mixing many mono and stereo tracks together (during playback) into one stereo digital audio track. So, using any sound card with a stereo digital audio DAC, you can actually record as many tracks as you like. CakeWalk virtualizes the card so that it appears to have many digital audio tracks, mono and/or stereo. Most pro programs that work with digital audio support the concept of "virtual tracks".

There are a few drawbacks to this scheme though. First, the more tracks that you record, the more that you have to back off on the individual volume of each track. Why? Because when all of the tracks are mathematically summed together by CakeWalk, if at any point the sum exceeds a 16-bit value, you'll get clipping. The more tracks that you sum, the more likely you are to get clipping -- unless you back off on the individual track volumes more with each added track. It doesn't matter if your card has 18-bit (ie, Tahiti) or 20-bit (ie, Pinnacle) DACs. CakeWalk itself performs the sum into a 16-bit mix, so there is an inherent 16-bit limitation to CakeWalk's output (or any other program that is limited to 16-bit digital audio -- some digital audio programs offer greater than 16-bit resolution for their internal mixing, such as 24-bit or 32-bit. These programs don't require you to back off on the individual wave volumes so much. If you're using a lot of virtual tracks, use a program with at least 24-bit resolution for its internal mixing). So the more tracks you record, the less dynamic range you get for each individual track as you turn down the individual volume to avoid clipping the mix. I doubt that you'd want to mix more than 4 mono virtual tracks to one card. (On the plus side, if you get more digital audio cards, you can then split up virtual tracks among them since CakeWalk can use more than 1 card simultaneously. But remember that most ISA cards require at least 1 DMA channel for playback, and 1 for full-duplex recording, and a PC has only 3 16-bit DMA channels, so you're pretty much limited to 2 ISA sound cards -- unless you get a Tahiti, Fiji, or Pinnacle which use a proprietary, non-DMA method of output, or a PCI card as those do not use motherboard DMA). Of course, you could get one of those cards with 8 digital audio channels like a DAL V8, Antex SoundCard, EMagic Audiowerk8, or DigiDesign Session 8, which usually have their own accompanying program to support better than 16-bit DACs and perhaps better throughput.

Secondly, you may lose some of the individual hardware control of the card when CakeWalk takes over the task of digitally mixing the output. For example, CakeWalk has to operate a Roland RAP-10 in its stereo mode, so you lose the RAP's individual control over each track's reverb/delay and chorus (effective when the RAP-10 channels are used as 2 Mono channels). In other words, most programs take a generic approach to virtual tracks which may not support some of the more esoteric functions of certain cards. So, it's important that if you get a card that has something more than just a simple, stereo 16-bit DAC, you make sure that you have software support for that additional functionality lest you want it to be wasted by a program that treats your card as if it were a simple, stereo 16-bit DAC.