Chapter 3. In the Digital Domain
3.3. Digital Audio Workstation
The Digital Audio Workstation (DAW) is the brain of the audio system: it stores the digitized sound; it feeds information to software plugins for digital processing; it then feeds bit music to the DA converter to transform the processed digital information back into analog voltages.
The DAW, from a technical point of view, is a computer (hardware and operating system) with specialized processing software installed on it. The audio interface drivers are technically part of the operating system: you could say that the audio interface extends onto the DAW, but that would not make things any clearer, would it?
When they speak about a DAW, most people mean the processing software itself rather than the whole system, most likely because hardware is so ubiquitous these days. Before you ask, both PCs and Macs are worthy choices for DAW hardware.
In the table below, the most popular DAWs are presented. I am not going into the “which one is better” debate because, frankly, it is pointless: do I argue about what socks you should be wearing?
|Vendor||Name||Version||Price||Main selling points|
|Sony||Acid Pro||—||$150||Focus on composing & looping|
|—||Audacity||—||Free||Free entry-level recording|
|Included instruments, focus on creation|
|Focus on producing & performing|
|Apple||Logic Pro||—||$200||Included instruments|
|Audio industry standard|
|Cockos||Reaper||—||$60 (1)||Configurability, included plugins|
|Ease of use, included plugins|
|Included effect plugins|
Table 10 Mixing Software (DAW)
Working with modern DAWs is usually very easy: you start by recording tracks directly from the audio interface, or by importing tracks (usually in uncompressed WAV format) into the software. Arranging the tracks in some logical manner should be the next step: modern productions can contain over 100 tracks, the norm being in the 50-70 tracks range. Labelling, giving tracks a color, grouping tracks by instrument type – these methods help make sense of the structure of the recording. Routing information from one track to another, or from one track group (bus) to another, helps minimize the logical signal path.
Once that is done, the tracks can be mixed: a balance must be found between these instruments and voices. Some elements need to be refocused or corrected with the list of effects described in section 2.5: gates, reverbs, compressors, EQ, etc. Anything works, if it sounds good. These effects are usually associated to a track in the form of a software plugin in one of many formats, the most well-known being VST (Virtual Studio Technology, developed by Steinberg) and AAX (Avid Audio Extension, developed by… Avid). Once the balance is found and all the elements in the mix have been properly taken advantage of, a portable version of the mix is exported to an audio file so that it can be distributed and consumed.
The data captured from the sampling described in section 3.1 is digitally stored in numbers represented by bits. The total number of bits available to store the information contained in each sample is the bit depth ; the larger the bit depth, the more amplitudes of that sample can be recorded and stored; note that the bit depth does not impact the accuracy of the signal’s sampling: that is the job of the sampling rate.
What the bit depth has an influence on is the maximum loudness of a signal, represented by the dynamic range DRmax or signal to noise ratio SNR of the signal:
For a 16-bit system (BD=16), DRmax ∼ 96 dB. For a 24-bit system, DRmax ∼ 144 dB. The dynamic range represents the height of the sampling window: whatever is larger cannot be sampled accurately. The raging debate about which bit depth should be used (16 vs. 24 bits) can be put to rest with simple comparison tests: listen to a song at 16 bits and then at 24 bits. I can guarantee you that most people will not hear the difference – but everyone still uses a 24-bit depth, including me: you never know 🙂
One note of importance: most DAWs will show you sound “volume” in dB FS (for Full Scale) with a negative value. Why? Because 0 dB FS is defined as the maximum capacity (the bit depth) of the AD converter; thus, any sound will have a negative value (lower than 0 which is the maximum).
What happens when you work at 24 bits but then decide to export your file at 16 bits? We are losing information, are we not? And that is bad because losing information means losing the possibility to reconstruct the original sound without artifacts and distortions, right? Yes, but there is a way around the problem. A technique called dithering uses the fact that our ears are very good a picking out individual frequencies from a sound but bad at picking out random noise at many frequencies: dithering mixes in random noise into the original sound to fake our ears into thinking every bit of information is there. This process is also used when moving up the bit-depth ladder instead of down.
Audio bandwidth is the amount of audio data passing through some device; it is expressed in kbps (for kilobits per second, or 1024 bits per second) and is called the bit-rate :
where BD is the bit depth, SR is the sampling rate and MS is a mono/stereo variable (mono: MS=1, stereo: MS=2). Standard CD sound is usually stereo (MS=2) and transmitted in packets of 16 bits (BD=16) at 44.1 kHz (SR=44100), giving a bit-rate BD of = 1411.2 kbps. At that standard rate, a 3-minute (180 seconds) song would be large at 260 Mb, or 32.5 MB (1 Byte = 8 bits).
At the time when digital audio started to become popular with the use of CDs, network bandwidth and disk sizes were not large enough for files that size to be practical. Thus, audio files were compressed. Compression is the process by which the amount of information used to represent (or encode) a file is decreased. Lossy compression is the same process but with loss of information. For example, the popular MP3 format is the result of lossy compression (MP3 means MPEG-1 or MPEG-2 Audio Layer 3). The technique used is based upon an effect called masking: a weaker sound is masked by a louder sound at similar frequencies; this means that instead of representing the information of both sounds in the file, only the information of the louder sound is encoded in the file.
Standard MP3 usually has a bit rate of 128 kbps; that is 10 times less than what our calculation above has come up with! But with that bit-rate, the song size is also divided by 10, which gives the standard 3 MB per song. With a standard 16-bit audio system, that is a sampling rate of 4 kHz in stereo. Compare this to the CD standard sampling rate of 44.1 kHz and you will understand why certain people consider MP3-style compression an aberration.
Is it good enough for you? Only your ears can decide. Note that MP3 encoders can vary depending on the algorithms used; also, MP3 file formats allow bit rates up to 320 kbps to be used. With bigger bandwidth and larger storage capacities available, audiophiles prefer to listen to music encoded in lossless compression formats such as FLAC (Free Lossless Audio Codec) which can reduce the size of an audio file by 50-60% without altering its quality.
We know that a converter creates samples by measuring the incoming electrical voltage. As it works through the signal, the converter places the samples in a temporary memory location called a buffer. When the buffer is full, the computer retrieves all the stored data in one operation to process it. The reason for this mechanism is that the computer might be busy doing other things as it is processing the signal; if it is too busy, it might miss certain samples, introducing distortion artifacts in the rendered sound; imagine a factory worker putting pieces in a box: if the conveyor belt moves too fast, the worker will miss certain pieces. The latency L can be calculated as follows:
where BS is the buffer size (number of samples). Therefore, you will hear audio engineers say: “Man, this buffer size is total BS”.
For example, a fast computer might accommodate a buffer size of 64 samples for a sampling rate of 44.1 kHz since the latency produced will be about 1.5 ms (64 / 44100 = 0.00145 seconds, or 1.45 milliseconds). A slower computer might need 512 samples to avoid distortion, but this would generate a latency of about 11ms, which can be an issue if you are recording drums with a click track, for example. You can increase the sampling rate, thus reducing the latency; for example, a 512 buffer size with a sampling rate of 96 kHz produces a latency of about 5ms.
In turn, increasing the sampling rate might force the computer into missing some samples, thereby introducing the infamous distortion. ASIO (Audio Stream Input Output) drivers allow you change the buffer size on the fly; this change can be expressed in latency terms; for example, you could set your buffer size to “high latency” for slower computers. Do you know which company invented ASIO drivers? Steinberg. Again? Yes, Steinberg was a huge player in the early developments of audio technology.
The time it takes for a computer to sample the incoming sound is in direct relation with its computing capacity (its processing speed). With a “slow” computer, the buffer size must be larger. If you try to reduce the buffer size, cracking noises will appear. The best way to strike a balance is to experiment with various settings, buy a faster computer, or outsource the computing to outboard gear.
|Previous section||Next section|