top of page

Chapter 3 Contents

Audio Theory Fundamentals

Understanding basic audio theory is also an important aspect to understanding how MIDI and audio integrate.  The following concepts and terms are required to move forward with the proper use and understanding of the technologies employed.



Sound is the disturbance of molecules in the air.  It is elastic and periodic in nature.  Sound cycles are periodic travelling waves.  When molecules are pushed together it is referred to as compression.  Conversely, when molecules are pulled apart it is referred to as rarefaction.  The speed of sound or propagation of sound is measure by V ≈ 331.4m/s + 0.6*Tc where V = velocity (meters per second), and Tc = temperature in Celsius.  The speed of sound increases as temperature increases.  Atmospheric and barometric pressures also have an effect on temperature.  Under 68∞ Fahrenheit conditions with no pressure anomalies, sound travels at 125 ft/s or 767 mph.


Sound is measured by the frequency of waves.  The higher the pitch the higher its frequency (cycle) and the lower a pitch, the lower the frequency (cycle).  Heinrich Hertz, a 9th century physicist developed this measure and used Hz to identify the frequency of different wave forms.  Human hearing ranges from 20hz to 20kHz or in other words, 20 cycles per second to 20,000 cycles per second (cps).  However, 16kHz is considered the ceiling for most individuals hearing ability.  


Illustration 3.1: Chart of FQ Ranges for Human Hearing.




Note that FQs below 20Hz are infrasonic while FQs above 20kHz are ultrasonic.



The ear is a transducer which change acoustical energy into electrical energy much like a speaker does in reverse, changing electrical energy into acoustic energy.  Our binaural hearing-bi meaning two-is omnidirectional.  We can hear spatialized sounds all around us.  It is also connected to our auditory centre and nervous system thus, low FQs can cause us bodily discomfort while high FQs can cause temporary threshold shift, also known as cotton ears.  Too much high-volume listening can also cause tinnitus-ringing in the ear.  Both of these can lead to permanent hearing damage and therefore it is wise to monitor at a comfortable listening level whether using studio monitors or headphones.


Temporal Fusion occurs when direct and reflected sounds reach the ear within 1-30 milliseconds.  These sounds are heard as one sound.  When direct sounds reach the ear first and reflected later it is referred to as the Hass effect or precedence effect.  Should reflected sounds reach the ear 50 milliseconds or longer, it is perceived as an echo.  Acoustics refers to the quantifiable study of sound while psychoacoustics is the study of our perception of sound based on our binaural hearing, its capabilities, and limitations.


For example, the correlation between pitch and amplitude suggest that if we assume the speed of a wave is constant, then amplitude and frequency are inversely proportional to each other.  In other words, the pitch or shrill lowers slightly when the amplitude (loudness) is increased and when the amplitude is lowered the pitch or shrill slightly increases.  This phenomenon may or may not be noticeable under normal circumstances because the pitch shift is very small in most cases.


Additionally, the equal loudness contours and the Doppler effect further show how our hearing functions differently through the audio spectrum and that the distance between our ears contribute to how we perceive sound.  


Illustration 3.2:  Fletcher-Munson–Equal Loudness Contours



Wiki Images


What this graph illustrates is that as loudness changes, the perceived loudness we hear is interpreted by our brain and changes at a different rate depending on the FQs.  When listening at low levels, midrange FQs sound more prominent while other FQs appear to fall into the background.  At high level listening, the lows and highs become more prominent while the midrange seems quieter.  However, the overall balance of the sounds remains the same no matter what the listening level is.


Doppler Effect

The doppler effect is a change in wavelength and amplitude of a wave.  It is caused by the change in distance between the source and the listener.  Think of an emergency vehicle with sirens travelling past you and the phase shifting that occurs as it moves closer and further away-increase in pitch and amplitude as it approaches and a decrease in pitch and amplitude as it travels further away.


Sound Waves

There are three types of waveforms.  They are sinusoidal, square or pulse wave, and triangle or sawtooth waves.  Each creates a unique quantifiable timbre or tone color.  For example, sinusoidal waves can be equated to the color of flutes, pulse waves to the color of clarinets and triangle waves to the color of strings.  Clearly, combining any of these will create different and far more complex timbres.  These are called multi-component waves.  All complex waves can be deconstructed to a simple sine wave.  This is referred to as a Fourier Analysis.  The analysis of the waveform can be done in some software programs.


Illustration 3.3.1:  A multi-component wave comprised of three different waves at 440Hz.


3 waves (sinusoidal, square, and sawtooth), all at 440Hz.


Duncan Metcalfe


Illustration 3.3.2   Resulting mix–note the added complexity to the waveform.


Duncan Metcalfe


Furthermore, each wave consists of amplitude (loudness), wavelength (cps), compression or crest and rarefaction or trough.  The average resting place is 0 on the x/y axis and represents silence.


Illustration 3.4:  Parts of the wave.






Duncan Metcalfe

Acoustical Phase

When two waves are in phase, that is, they both begin and end at the same time, they increase the resultant amplitude of the combination.  When they are 180∞ out of phase they create phase cancellation resulting in silence.


Illustration 3.5:  Example of two sinusoidal waves creating phase cancellation.



Duncan Metcalfe


Additionally, there are two other possibilities with wave phase.  First constructive interference occurs when two waves are slightly out of phase up to 90∞ creating an increase in amplitude.  Second, destructive interference occurs when two waves are between 91∞ and 180∞ out of phase causing a decrease in amplitude.  


Illustration 3.6.1:  Constructive Interference


Two sinusoidal waves, out of phase



Illustration 3.6.2:  Resulting mix–note that the amplitude has increased


Duncan Metcalfe



Illustration 3.7.1:  Destructive Interference


2 sinusoidal waves, further out of phase



Illustration 3.7.2:  Resulting mix–note the amplitude has decreased


Duncan Metcalfe


Exercise 3.1

In Audacity, students will generate multi-component waves using different wave types and create the phenomenon of constructive–destructive interferences and phase cancellation.


Digital Domain

The digital domain is less forgiving than analog recording when it comes to headroom-how much signal can the system tolerate before distortion occurs.  Digital recordings are less forgiving than analog and can often create digital artifacts when signal levels are to high.  This occurs when one keeps adding audio tracks, digital signal processing, and other audio sources to the mix.  Simply put, the digital information crashes and creates a transient that results in audio pops.  That is also why when editing, the cut should be made at the average resting place otherwise the data transmitted will be incomplete and create an artifact.  Most software programs have scrubbers and expandable views to allow one to find the proper edit point.


Illustration 3.8  Example of a Digital Artifact-too much amplitude.



Duncan Metcalfe


Illustration 3.9  Example of a Digital Artifact-improper editing.



Duncan Metcalfe


Exercise 3.2

In Audacity, students will generate multi-component waves using different wave types and edit the wave at a point other than the average resting place creating a digital artifact.  Be sure to zoom in on the wave until you see the sample points before you edit.



There is more to the shape of a sound than its FQs.  Sound occurs over time which includes dynamic contour.  This is known as a sound envelope which is comprised of the attack, decay, sustain, and release (ADSR).  Imagine a percussion instrument with a sudden attack and quick decay while a string instrument has an envelope which can have both a sudden attack or a subtle one with a long decay and sustain as required in the score.  Strings, winds, and brass can articulate a dynamic increase in dynamics with no decay and a longer sustain.


Illustration 3.10  Envelope of a percussive instrument.



Duncan Metcalfe


Illustration 3.11  Envelope of a flute sustaining a pitch.











Duncan Metcalfe

Envelopes of sounds add variance and dynamics to the music and are crucial in creating realistic and human sounding scores.  Much of the dynamics will be part of the score or manipulated with MIDI data.


Overtone Series

Electronic instruments can generate pure FQs with no overtones.  They can also combine several waves to generate a similarity between an acoustic instruments timbre and character by adding harmonics.  This complex tone can be described as a series of periodic waves created in order of the overtone series which begins with the fundamental or carrier wave.  One can add partials that may or may not be integer multiples of the fundamental.  If adding to the fundamental a series of tones (partials) that are positive integer multiples, one generates a harmonic series also known as the overtone series.  


Illustration 3.12  A fundamental with five overtones.



Duncan Metcalfe


Illustration 3.13  A sixteen note harmonic series described in intervals.

frequency ranges
Fletcher-Munson curves
multicomponent wave 1.png
multicomponenet wave 2.png
Screen Shot 2021-10-22 at 10.04.34 AM.png
phase cancellation.png
out of phase - constructive.png
constructive interference
out of phase - destructive.png
destructive interference.png
artifacts - too high amplitude.png
improper edit.png
percussion envelope.png
flute envelope.png
harmonic sreies.jpg
sixteen note harmonic series.png
bottom of page