Chapter 4. From Digital Audio to Analog Sound
4.4. Human Hearing
Sound is coming out of our speakers in our now well treated room. We know it does because our volume meter is pumping up and down. But why do we hear this sound? How is it possible that simple pressure changes in the air can transform into beautiful sounds?
Basic Hearing Mechanism
This apparently simple phenomenon is in fact very complex. To give you an overview of the process, here is what happens.
In the picture below(1), the biological process of hearing is reproduced as a system.
Air molecules moving back and forth under the varying pressure waves enter the external ear and travel to the tympanic membrane (also called ear drum; you pierced it when you forgot to bring your ear plugs to that Taiwanese trash metal concert you went to last month); the shape of the ear helps pressure waves concentrate into the ear canal. The membrane vibrates at the incoming sound frequencies and transmits them to the middle ear, a cavity containing the three smallest bones in the human body (middle ear bones). The middle ear is there to do something we mentioned in Section 2.2: impedance matching! Impedance matching is needed at this stage because the inner ear is full of liquid while the external and middle ears are full of air; without impedance matching, pressure wave transmission would be too inefficient and the signal strength would be too faint to reach the brain.
Pressure is then transmitted from the middle ear to the inner ear through another membrane: the oval window. Instead of air pressure variations, we now have water pressure variations. Inside the inner ear is a long tube (the cochlea) which serves as a mechanic-neuro transducer: mechanical pressure (from the membrane moving back and forth) is transformed into electrical signals so that the brain can decode them. The tube inside the inner ear is built in such a fashion that at the entrance of the tube, high-frequency pressure waves are detected (through a resonance mechanism), while at the end of the tube, low-frequency vibrations are detected. Therefore, we can hear multiple frequencies simultaneously. Note that smell, taste and vision are chemical processes in nature; hearing, on the other hand, is fully mechanical.
From the Ear to the Brain
When pressure wave frequencies are detected alongside the tube, neurotransmitters are released into auditory nerve synapses found in the fibers connecting the nerve to the tube; neurotransmitter are chemicals acting as messengers between the different regions of a neuron. It is like a row of fireworks: depending on how long and how dry the wicks are, fireworks will fire off at different times at different points alongside the fireworks row.
Those synapses in turn produce electrical signals called action potentials which travel through the brainstem (found at the base of the brain at the end of the spinal chord) to the thalamus (found at the top of the brainstem) to the primary auditory cortex (part of the outer brain in which neurons’ main function is to deal with auditory processing) in the temporal lobe (on the side of your head, where people aim their guns when their 10-hour mix session crashed and they never backed it up).
What is interesting is that the frequency specialization found in the regions of the tube inside the inner ear is also found in neurons connecting it to the auditory cortex: some neurons transmit certain frequencies better than others, depending on how they are connected to the tube. The common word for this transmission is “fire”: neurons fire when they transmit information, making the fireworks image spot on.
Information such as pitch, timing and harmony is apparently generated in the auditory cortex. What happens after that is still a complete mystery. What is currently known is that information generated by the auditory cortex is transmitted to other parts of the brain for further processing. Of note is the fact that very low frequencies (below 20 Hz) can be sensed through touch rather than through hearing.
Once the brain receives the encoded sound information, it processes it with cognitive brain functions studied in psychology and other human sciences. Thus, studying how sound is perceived is a multidisciplinary science at the crossroads between physics, biology, psychology and sociology called psychoacoustics.
Psychoacoustics is interesting because it shows us that hearing is not only complex from a biological standpoint, but also from a psychological (perception) standpoint. For example, if I listen to a sound missing its fundamental frequency but not its overtones, my brain will automatically extrapolate the sound’s fundamental frequency from the overtones; this is the missing fundamental effect; in this case, our brain generates information that is not present in the incoming signal! Thus, this also happens with vision.
Sound localization is also calculated in the brain from very small differences in pitch, direction and volume. Another interesting psychoacoustical effect is masking, used to compress sound (see the Audio Bandwidth paragraph on page 40): when a louder sound source is heard simultaneously with a quieter sound source, the quieter source is masked and not perceived; as a result, information encoding the frequencies from the quieter source can be omitted in the signal.
The king of all psychoacoustical effects is actually not an effect: the Fletcher-Munson effect, describes the fact that we do not hear all frequencies with the same loudness, even if the sound intensity is the same (see below (2)).
In other words, the curve describing the relationship between frequency (Hz) and loudness (dB) is not flat at all.
On the previous picture, equal-loudness curves are shown for different sound intensities (measured in phons). Take the 40 phon curve: at 20 Hz, the sound is perceived to be very loud (almost 90 dB), while at 1000 Hz it is only perceived to be 40 dB. Remember, a bump of 6 dB means twice the sound intensity. What is interesting is the bump between 1000 and 2000 Hz: an evolutionary biologist might say that our hearing system was selected because it could in fact perceive (and discriminate) human voice better than other systems because that frequency range is precisely where the human voice is centered; note how the loudness of the 1000-2000 Hz region is pretty on par with the corresponding sound intensity: 20 phon is about 20 dB loud, 40 phon is about 40 dB loud, etc.
The Fletcher-Munson curves also explains why certain music sounds better when it is loud: all frequencies have a minimum loudness which ensures that they can be heard, giving the sound a fuller quality.
(1) Reproduced with permission from reference 
(2) Legally reproduced from here
|Previous section||Next section|