Chapter 2. From Analog Sound to Digital Audio
We now know that an audio signal has three characteristics: intensity (or volume), frequency and phase (see Equation 1). Each characteristic can be modified by cleverly placing electronic components in the sound’s signal path. In Table 6 below, basic processor or effect units (or effects, or, for even lazier people, FX) are displayed.
|Gate||Intensity||Decreases intensity if > threshold||Cleaner sound|
|Compressor||Intensity||Decreases intensity if > threshold||Cleaner sound
|Expander||Intensity||Increases intensity if > threshold||Cleaner sound
|Distortion||Frequency||Adds frequency information||Colored sound|
|Equalizer||Frequency||Increases or decreases intensity of certain frequencies||Cleaner sound
|Phaser||Time||Changes phase||Colored sound|
|Delay||Time||Creates copies of original signal||Colored sound||
Reverb, flanger, chorus
Let us study this table for a bit. First, you will notice that there are three main types of effects: those who work with intensity (or volume, also called dynamics processors), those who work in the frequency domain and those who work in the time domain. Technically speaking, the phaser obviously works with phase, but since it works with phase with respect to time, it is always put in the time domain processor category. Tremolo is an effect which modulates the volume of the original signal; it is in this sense an offshoot of both the compressor (modulating down) and the expander (modulating up).
It might look like a theoretical separation, but knowing what you are working on can help you troubleshoot what the effect is doing when the results do not sound good; furthermore, generally speaking, clean-up effects should always be used before coloring effects for a very simple reason: you do not want to be coloring or enhancing parts of the sound that you do not like. Second, what you obtain when passing a signal through one of these effects (the results column) can be sorted out in two categories: a cleaner sound or a colored sound. A cleaner sound means that the effect has been used to remove some undesirable feature of the sound.
For example, a gate might be used to clean up background noise; a compressor might be used to even out a performance; an expander might be used to bring up an interviewee’s speech level to the interviewer’s for a better listening experience; an EQ might be used to remove some low frequency build-up from a bass amp or an offending frequency caused by a cheap microphone. In all the examples above, the result is a sound cleaned up and prepared either for further processing (most likely) or for direct listening.
The phrase “colored sound” might sound weird: how can a sound be colored? What most people mean by that is how their perception of the sound has been altered, usually in a pleasing way. For example, you have seen some preamps labeled as delivering a “colored” sound in Table 5: the preamp adds frequency content (the “color”) by amplifying the signal through certain electronic components.
In the case of effects, the same principle applies. A compressor might color a snare drum sound by letting the initial hit sound go through but reducing the volume afterwards to “tighten up” the sound; a transient shaper might make a cymbal hit sound for longer by increasing the duration of the sound; a tremolo might give a guitar sound this old groovy 60s sound by rapidly varying its volume (not its frequency!); no need to explain how the distortion changes the sound, you rarely hear a non-distorted electric guitar these days; an EQ might color the sound by adding high frequencies to an otherwise dull flute solo; a Vocoder might restrict the frequency range of a vocal performance, giving it the famous robotic feel; a Wah-wah pedal might make a guitar solo much funkier by modulating the original notes with a sweep of some frequency range (think Jimi Hendrix); a flanger or a chorus might make a guitar sound much weirder by changing its phase and remixing the processed signal with the original (think The Police); a delay or reverb might make a sound seem bigger by playing copies of the original signal right after the original, emulating reflections from surrounding walls (think Pink Floyd, or any progressive rock band for that matter). Of course, these effects can be chained one after the other, making creativity with even a simple original sound infinite.
The original signal can be modified with those effects by two different means, or, more precisely, in two different domains. Because we are still in the analog domain (our signal still has not been converted to digital bits), those effects would logically be produced by hardware units which include tubes, filters, transformers, etc. Some of these units have become famous and will be mentioned in Section 2.6. In the digital age, some of these units’ effects have been modelled in software called plugins. Some of these plugins will be mentioned in Section 3.3.
These effects usually share one characteristic: the original (dry) sound can be mixed with the processed (wet) sound to change the amount of effect added to or subtracted from the signal. If the device does not possess this feature, splitting the signal to two tracks at mixing stage and varying the relative volume of both tracks yields the same result. Also, each of these effects can be used to trigger other effects, yet increasing the complexity of what can be achieved with just a handful of processors.
Let us now dive into what each effect does and what its main parameters are.
A gate allows a signal to pass only if it is above a certain threshold. This can be used in various situations: to clean up unwanted noise from a recording, or to filter a signal to then pass on the “on/off” information to another signal for further processing (this is called triggering).
Figure 7 Gate Parameters The graph displays the signal intensity with respect to time; the raw signal is the dashed line; the processed signal is the dotted line; the intensity threshold is the thick horizontal line; A stands for Attack time, H for Hold time, R for Release time.
In Figure 7 above, you can see an incoming raw signal increasing in intensity; a noise gate is placed on its path. When the signal’s intensity crosses the threshold, the gates starts to open and the processed signal appears; it finishes opening when the attack time has passed. At the end of the attack time, the unprocessed and processed signals are the same. When the incoming signal’s intensity gets smaller than the threshold, the gate is programmed to wait a set amount of time (the hold time) before it starts shutting down; during the hold time, the unprocessed and processed signals are still the same. When the hold time is over, the gate starts to shut down and the processed signal’s intensity decreases; the gate finishes to shut down when the release time has passed; when the gate is closed, the processed signal disappears. You could also set the gate to reduce the signal’s intensity by some fixed amount – the range. In the example, the gate’s range is infinite: this means that when the gate is closed, the processed signal’s intensity is zero.
That is where Digital Signal Processing comes in; it is an engineering discipline that studies how signals are measured and transformed. In a nutshell, reducing a signal’s intensity is achieved in the physical world simply by placing electronic components (filters) in its path; circuit design determines the type of filtering. In the plugins effect world, programming models the behavior of those components. For an excellent overview of Digital Signal Processing, please see reference  or go directly here.
A first example of gate use would be when you are trying to take out room noise from a live recording: set the threshold right above the noise maximum level and only the louder parts of the signal will go through. You could of course manually lower the volume of the signal every time nothing useful is being recorded, but that would be tedious and open to a lot of mistakes; recording these knob movements (here: the intensity of the signal) to replay them is called automation.
A second example of gate use would be if you would like to trigger an echo whenever the vocals go above a certain loudness: you would set up the threshold high so that only the loudest parts of the vocals would be let through by the gate and then use that information to trigger the echo; this process is called side-chaining: using the information from one signal to trigger changes on another signal.
Air is a natural compressor, so our ears are used to listening to a compressed sound: that is why the compressor is one of the most used effect in the industry, trying the recreate the sensation of listening to live sound. A compressor effectively smoothes out signal peaks to a more uniform signal. You can use this effect, for example, to catch unwanted peaks in a vocal or a guitar recording; you could also use it to bring more coherence to an overall recording by limiting the difference between its quietest and loudest parts. Another way of saying the same thing is: a compressor limits the dynamic range of a signal. For a nice detailed discussion, see reference  and the corresponding Wikipedia article.
It is the ratio of the loudest possible intensity (volume) to the noise floor, defined as the quietest possible intensity of the signal. By now, when you hear “ratio”, you think dB. We saw that a typical dynamic microphone outputs voltages in the 10 mV region, and that a typical processing device works with voltages in the 1V range; from Equation 3, we know that the ratio of amplitudes is expressed in dB with so in our case, the ratio is 40 dB – that is the gain needed from the microphone preamp but also the dynamic range available to the engineer recording the sound.
A standard compressor has three main components: a level detector, a gain reduction module and an output amplifier. The level detector’s jobs are, 1) to see when the incoming signal’s intensity is above a set threshold, 2) how fast it has to start sending the reduction information to the gain reduction module (attack “time”), and 3): how fast it has to stop sending it (release “time”). Yes, you read that right: the attack and release « times » are speeds; depending on manufacturers or developers, the compressor filters are even specified differently for attack and release. For example, attack filters are usually specified in the amount of time it takes for a set percentage of the gain reduction target to be achieved, which means settling for less; that percentage ranges from 70-95%. Release filters are usually specified in the amount of time it takes for a 10-dB gain reduction to happen.
The gain reduction module’s job is to aim at reducing the signal’s intensity by using a series of filters; the compression goal is called a compression ratio. You notice that I do not say “compresses” but “aims at reducing”; it is not to be pedantic: compressors do not simply reduce intensity, they reduce intensity by comparing the current intensity with the previous intensity because that is how filters work; this means you cannot just “tell the signal to stick below a certain threshold”.
The output amplifier’s job is to increase the signal’s intensity by an amount called the makeup gain so that the sound has the same intensity it had before entering the compressor. The gain can be reduced right when the information is received by the level detector (hard knee); alternatively, it can be smoothed out by using a smaller ratio before using the full desired ratio on the signal (soft knee).
You might hear some people say that a compressor brings quiet parts of the signal up. That is not true of a compressor as a part of its standard operation: it merely aims at shaving off peaks going above the threshold. Only if you make up for the lost gain by increasing the processed signal’s output will the quiet parts of the processed signal be heard more: that is what the makeup gain is for.
In the list of effects shown in Table 6, the compressor is the first major effect that you will be hearing a lot about in audio circles and online forums. One of the reasons is because it is very widely used, but I believe that it is also talked about with a shroud of mystery because it is very rarely completely understood – let us hope that what you just read put you on the right track.
A limiter is a compressor with a very large ratio. This is of course theoretical in the analog domain, as the components used limit what the limiter can achieve.
Transient is an adjective meaning “passing especially quickly into and out of existence” (from the Merriam Webster dictionary); in the audio world, a transient (noun) is that concept applied to a signal or sound. A lot of people talk about transients and mean the percussive element or the initial attack of a sound, like a snare drum hit or a strung guitar string; other people mean the shape of the waveform, and that is what a transient shaper refers to: shaping the waveform of a sound.
A transient shaper has only two parameters, which makes it simple to use but very powerful at the same time. The first parameter allows increasing or decreasing the gain on the attack of the transient; the corresponding knob is usually labeled attack. You might think that this is exactly what a compressor with make-up gain does; you would be right, except that there is a difference: the compressor only works above a certain threshold while the transient shaper diligently increases or decreases the gain on every voltage it sees. The second parameter, linked to the first, allows increasing or decreasing the gain on the tail end of a waveform; this button is usually labeled sustain or release. There usually also is a gain button to control the output of the effect.
Learning how to properly use a transient shaper can be a great help when a compressor does not work or an EQ is too blunt of a tool. I would even recommend using a transient shaper before starting to use a compressor simply because it is much more intuitive and easy to understand. I know their uses are different, but there are times when increasing the gain on a transient is better than trying to make the transient coming out by reducing the dynamic range of the signal. If the compressor is the well-known superhero (think Iron Man), the transient shaper is the unsung superhero who comes out of nowhere to save your day when you think all has been lost (think Gollum).
An expander, sometimes also called an upward compressor, is the exact opposite of a compressor (or downward compressor): it increases the dynamic range of a signal. In this regard, it is a type of gate: it can make quiet signals even quieter by comparison to louder parts by increasing the dynamic range of a signal. Expanders are not the most talked-about effects but are worth knowing about.
Distortion is an effect which is caused by inserting electronic devices in the signal path to limit or clip the original voltage signal. In audio terms, the result is that frequencies that were not present in the original signal appear in the processed signal. Odd or even harmonics appear in harmonic distortion, while combinations of the original frequencies appear in intermodulation distortion. The first type will always sound “musical” since multiples of the original frequencies are, for each frequency, simply octaves above the original. For intermodulation distortion, this might not be the case since a sum or difference in two frequencies might not yield a frequency related to the original frequency. See reference  for more details on distortion.
Depending on the type of overtone added by the electronic circuit, the resulting signal will have a different shape. For example, adding an odd harmonic to the original signal will approximate a square wave, while adding an even harmonic will approximate a saw-tooth wave. Combining these will affect and change the original signal in different ways; see this article for a very nice graphical description of the distortion effect).
The guitarists among you will want to know the difference between distortion, overdrive and fuzz: are they the same? No, they are not. Overdrive is exactly what the name indicates: pushing (driving) the amplifier into generating harmonics by pushing the electronics in the amp past their intended power range; this means that in overdrive, the main goal is to gain power, with adding color a secondary objective; as a result, the processed sound is different from the original, but not by much. Distortion is the opposite: the goal is to mess with the original signal as much as possible, with some power gain along the way. Fuzz is obtained by inserting special transistors into the signal path which add frequencies to the sound, recognizable by its warm and “fuzzy” character (think Jimi Hendrix).
In our effect terminology, distortion (meant in the generic sense) is clearly a coloring tool. All sorts of different devices and tricks can be used to distort the original signal: permanently damaging amplifiers, using devices in unintended ways, etc. In the digital domain, people spend fortunes on software to try and reproduce those analog “warm-sounding” distortion effects.
An equalizer (EQ) effect modifies the gain on certain frequencies, either down (cut) or up (boost). It is a very well-known feature of many HiFi home stereo systems. It might be popular because turning knobs has a direct impact on the outgoing sound and a repeatable impact at that. The same applies to audio: the equalizer is the most famous effect in the audio engineer’s bag of tricks.
EQ effects can be simply specified with three parameters describing how the signal responds to the components placed in its path: the shape of the EQ change, the frequency at which the equalization happens and the speed with which the effect comes into play.
EQ shapes come in three kinds: pass, shelf, and band. A pass filter lets every frequency above (high) or below (low) through. Another (opposite) way of thinking about it is to talk about cut instead of pass: low cut is high pass and high cut is low pass; I will stick to pass because I think it is more intuitive.
Figure 8 High Pass Filter The graph displays the signal intensity with respect to frequency; HPF stands for High Pass Frequency.
In Figure 8, we see a high pass filter: the high-pass frequency is defined as the frequency which falls below the original frequency by 3 dB. The speed (or the slope) with which the EQ change happens is usually expressed in dB per octave or dB/oct. Specifying this type of EQ would then mean saying that it is a 300 Hz high-pass filter at -12 dB/oct. Remember that an octave is the frequency distance between two equivalent notes, e.g. C3 – C4, so, yes, this means that -12 dB/oct is not the same intensity reduction at 100 Hz than at 10kHz.
A shelf filter is the combination of a pass and an inverted pass filter around a specific frequency. In Figure 9, we see a shelf filter consisting in high pass filters (dotted line) with a cut in low frequencies and a boost in high frequencies, and a shelf filter consisting in low pass filters (full line) with a boost in low frequencies and a cut in high frequencies. Exactly like for the pass filter, the slope in dB/octave determines how quickly the changes happen.
Both filters are obtained with what are called first order filters: in the analog world, that is what the combination of a single resistor and a single capacitor can produce. Because of their simple design, these filters can only produce those filter shapes depicted above. For more complex filters like the band pass filters, which lets frequencies within a range pass (you could describe it like a combination of a high and low pass filter), we need to go to so-called second-order filters because of the mathematical functions involved in describing the frequency response to the filter. Here, an extra parameter applies: Q ; it tells us how wide the frequency change is:
where ν0 is the center frequency, νR is the right frequency when the EQ curve is -3 dB from its highest point and νL is the left frequency when the EQ curve is +3 dB from its highest point. A high Q means that the EQ change will happen over a very narrow bandwidth.
Taking frequencies out of the signal is usually associated with cleaning up the sound (for example, taking low frequency rumble off a bass guitar recording). Of course, cleaning up in this fashion also colors the sound in some way – after all, we are changing how the sound sounds; however, I will stick to thinking about EQ cuts as cleaning because that is the easiest way to think about it. Cuts are generally made with a small Q (narrow width); this sounds more musical to a lot of professionals, but as always with audio, feel free to experiment. Adding frequencies is usually done with a larger Q (wider width); again, experimenting is the key, but starting off with a larger Q will help you keep your track sounding musical.
EQ cuts should be made before applying any other effects, especially compression, while EQ boosts should be made later. The reason is that compression will bring out the quieter more subdued parts of the sound, like breaths, or string noises. If you do not remove these parasites, for example with EQ cuts, they will appear to be louder in the compressed sound. EQ boosts, on the other hand, will stand out more if applied after compression, even if in theory, you could apply them before compression as well; in that case, keep the boost small and wide, otherwise compression will make the change even more dramatic.
I will mention one trick you can use to find out offending frequencies. First, listen to the sound and try to determine where the issue is frequency-wise. Once you have the frequency range in mind, apply EQ to the track with a very narrow Q and a very large gain boost and sweep the frequency range, going back and forth a few times; once you have spotted the offending frequency, simply reverse the gain from a large boost to a large cut.
A vocoder is an EQ filter applied to a vocal track; it is a good example of a coloring effect using EQ cuts. Different settings can be used, but good starting points are a lower bound of 500 Hz and an upper bound of 3400 Hz.
The wah-wah effect is an EQ filter which periodically in time adds and subtracts high frequencies to and from a sound. The effect name does in fact represent the effect itself: the “a” vowel represents the sound with added frequencies while the “w” represents the sound with subtracted frequencies.
A phaser or phasor is an effect that creates cuts and boosts in the sound’s frequencies by altering the sound’s phase over time. The way this is achieved is by placing an all-pass filter in the signal path: all frequencies go through but the phase relationship between them is changed by varying the phase with respect to frequency. All filters alter the signal’s phase because their impedance “delay” it.
This is how it works: the incoming signal is split in two; the first signal has its phase altered as described above; it is then mixed back into the second (original) signal; what happens then is that the out-of-phase signal frequencies cancel out while the in-phase signal frequencies are boosted. The peaks thus created are unevenly spaced; this means that they are not in a harmonic (multiple) relationship with the original frequencies. This gives the characteristic “eeeaaaaaooooaaaaaeeee” effect known mostly as a guitar effect, but also worthwhile as a keyboard effect.
Figure 10 Phaser The incoming signal goes through the splitter (S); one version of the signal goes through a phase change before being recombined in the mixer (M) with the original signal.
Phasers might have different parameters depending on how they are built, but they have at least a frequency to determine how quickly the phase is changed, a range of swept frequencies and a feedback control to determine how much (if any) of the signal processed should be fed back into the original unprocessed signal. Feeding the signal back into the input is a way to increase the magnitude of the effect. Van Halen is known for its uses of phasers, especially in “Atomic Punk” (jump to 0:50), but other bands such as Queen, Genesis (on keyboards) and Radiohead also used it.
A delay mixes a delayed version of a signal with itself. The result is a single echo or a series of echoes, depending on how many times the sound is split and fed back (delayed) into itself.
A delay effect essentially has one parameter: the amount of time the signal should be delayed by; this amount of time can be expressed in ms or in relationship to the song rhythm, e.g. in 1/8 note increments; the latter can help keep the delay “musical”, meant as in “with some level of coherence with the song”. A feedback control might also be present like for the phaser. Because of the apparent space gained by delaying a signal by small amounts, delay effects are sometimes used to create fake stereo images from a mono signal; the danger of that technique is that when the song is listened in mono, the combing effect resulting from the addition of two delayed signals will produce a flanging effect (see below).
A delay effect can be hard to spot because very short delays (15-60ms depending on the relative delay level) fall below our conscious listening threshold. The larger the delay is, the quieter the delayed copy can be before the effect is detected. When the delay is increased beyond our hearing threshold, it first gives a sense of spaciousness, even if our brain does not recognize both signals as being distinct. If it is increased further, the delayed signal is perceived as a distinct echo of the original sound. Delay effects are used by a lot of guitarists to thicken their sound, but perhaps the most well-known user of delay is The Edge from U2 (check out the intro to “Where the Streets Have No Name” or “One Tree Hill”).
A reverb effect is a more complex version of a delay: it mimics the reverberation (hence the name) or reflections of a sound in a confined space. Each time the original sound bounces off a reflecting surface, it creates an echo (a delayed copy of itself for a certain listening position inside the space) which decays over time depending on the sound’s frequency and conditions inside the space. Reverb is what you hear when you place a source in a room, play it and then abruptly stop it. Those multiple slightly altered less powerful versions of the source sound are interpreted by our brains as an indication that the source was played in a space.
The time it takes for the sound to die off is called RT60 for Reverberation Time; it is the time it takes for the signal to decay by 60 dB; depends on the volume of the space and on the inverse of both the total surface area of the room and the absorption coefficient of the material used to define the space (valid at 20 degrees Celsius or 68 degrees Fahrenheit):
It also depends on the frequency of the sound through the absorption coefficient. In a typical room of 4 meters by 3 meters by 2.4 meters (roughly 13 feet by 10 feet by 8 feet for you Imperials out there), RT60 = 0.08 / a . For plaster walls, a ~ 0.02 (see here), so RT60 is 4 seconds! Make your walls hardwood (a ~ 0.3) and the time falls to 0.267 seconds or 267 milliseconds. I am not going to make this an acoustics discussion, but the way your recording and mixing space is built (and treated, if necessary) is much more important than the gear you will buy; this is discussed in details on the interwebz; for a start, check out this forum.
The first analog reverb effect is to simply play a prerecorded sound in a room and have a microphone pick up the resulting sound- this is called a chamber reverb.
The second type of analog reverb is the plate reverb: sound is sent through a speaker-like device to the center of a thin metal sheet suspended in a sound-proofed box; pickups are mounted on the plate to detect vibrations induced by the transmitted sound; those vibrations move from the center of the plate to the edges of the plate and then back, like ripples on a pond; because sound travels in a circular fashion and since the plate is usually rectangular, the reflections coming back from the edges add complexity to the vibration transmission patterns, creating a complex network of reflections which give the reverb a lush dense character; the decay time of the plate can be altered by adding damping material on its surface. Because of the way plate reverb works, it does not adequately mimic a reverberated sound in a physical space (see this to understand why); thus, plate reverbs are usually used to color a sound rather than to give it a sense of space.
The last standard kind of analog reverb is called a spring reverb; it works by replacing the metal sheet with a spring, the sound reverberating up and down the spring. The resulting sound is more metallic and usually needs to be shaped by EQ to sound more musical.
Reverb parameters are many, and of course change depending on the type of reverb, but here are a few. First, the early life of a reverb is described by how early reflections behave: this parameter describes how many and how strong they are; some reverb processors even allow you to define first and second reflections individually. Second, how reverb behaves in time (tail reverb) is defined with pre-delay (amount of time between the initial reverb and the appearance of the tail end of the reverb), decay time (related to , see above; also sometimes called damping) and diffusion (how longer-lasting early reflections interact with the reverb tail). Other parameters allow the reverb to be shaped further: e.g. mid-side (or stereo width) allows controlling the dispersion of the reverb in the stereo field, or EQ filters can make the reverb brighter (low pass) or darker (high pass).
A flanger is a type of delay where the delay changes with time.
This gives the characteristic “eeeaaaaaooooaaaaaeeee” effect, like the phaser, but with more pronounced and deeper “eee” sounds than for the phaser; the sound of the effect is sometimes compared to a plane swooshing effect. Flanging sounds more natural to some people because the peaks obtained by adding a delayed version of a signal onto itself are evenly spaced (they are harmonically linked to the original frequencies). Flanging is used primarily for guitars, but can be used on any other instrument. A list of recordings where flanging is prominent can be found here.
Figure 12 Flanger The incoming signal goes through the splitter (S); one version of the signal is delayed before being recombined in the mixer (M) with the original signal; the amount of delay varies with time.
A chorus mixes the original signal with a copy of itself that is both delayed and frequency shifted.
The frequency (or pitch) shifted signal is usually modulated with a low frequency oscillator (abbreviated LFO); this means that the increase or decrease in frequency of the original signal follows a sine-type wave with a low frequency. The result is a sound which appears to have multiple voices slightly different from one another; for example, “Pull me under” by Dream Theater has a clean chorus effect on the guitar at the beginning of the song.
|Previous section||Next section|