top of page

Chapter 3 Contents

Digital Representation of Sound

Digital technologies for audio were developed in the late 1970’s.  In the digital domain sounds are described in numbers.  Noise is not recognized and therefore digital recordings are typically noise free compared to analog recordings.  The digital snapshots or samples are numeric codes assigned to each pulse.  Pulse code modulation (PCM) takes the samples at a prescribed rate (pulse).  Industry sampling rates include:  44.1kHz, 48kHz, 96 and 192kHz.  This means that if one is sampling at the industry standard rate of 44.1kHz, then 44,100 snap shots of the audio are taken every second.  Note that if sampling at 192kHz a greater amount of memory would be required for each recording.


Recall that human hearing has an upper limit of 20kHz.  This means if we sample at 20kHz then we would only get one snapshot of the wave per second meaning we would have only one point in the wave captured.  Therefore, we need to sample at a rate at least double that of human hearing, 40kHz.  This is referred to as the Nyquist Theorem.  Adding the additional .1 ensures that the sample is being made at different points on the wave providing a more accurate reproduction.  It is unlikely that audio will be in this upper range very often, but the same sampling shifting takes place with other frequencies when using 44.1kHz sampling rate or higher.  If one is to sample at too low of a rate, then a phenomenon occurs that creates random pitches or popping.  This is called aliasing.  However, this should not occur with a sampling rate of 44.1kHz.  Also, most sampling technologies include an anti-aliasing filter that eliminates frequencies that it cannot sample.


Illustration 3.14  An A440 sine wave sampled at 44.1kHz.





Duncan Metcalfe


However, there is more to the sampling of a wave than the rate of the sample.  The sample of words (01101011) which are the bytes that describe the audio represent additional information about the sample such as loudness, envelope, color, and other characteristics of the event.  These words or byte descriptors are captured using bit rate resolution also referred to as quantization. 


Illustration 3.15  Words Transmitted in Series



Duncan Metcalfe


Bit depth or bit rate resolution possibilities are 8, 16, 24, and 32.  For example, with 8 bit resolution (2X2X2X2X2X2X2X2) there are 256 permutations possible to describe the sound.  This bit rate offers about a 46dB dynamic range which is not suitable for professional audio.  While a bit rate of 16 offers more detail with 65,536 word permutations offering about 96db dynamic range and considered professional audio standard.  While a bit of 24 offers 16,777,216 and 32 bit offers 4,294,967,296 the latter both requires much more memory with a result in marginal gain in dynamic range and typically only used for editing.  To reduce from a higher bit rate, dithering is used.  One can reduce from 24 bit to 16 bit but some resolution is lost.


Illustration 3.16 Sampling Resolution



Duncan Metcalfe

Pulse Code Modulation Process

PCM is the process of taking an electrical analog signal and changing it to an electrical digital signal.  It is taking the + and – of a signal and generating a series of 1s and 0s.  The process begins with an analog source such as a microphone, synthesizer, tape, or other real, non-digital formats and converting them by using the following processors: 


• First an anti-aliasing filter is used to ensure that FQs that cannot be sample aren’t.  

It functions as a low pass filter which removes higher FQs from the process.  This 

prevents popping or digital distortions.

• Second is the sample and hold where the picture of the sample (as per sample rate

 and bitrate resolution) is taken.

• Third, the words are then converted to digital binary words comprising of 1s and 0s 

and are dependent on settings in the second step.  These are analogous to the 

original analog source.

• Fourth, data coding identifies the spaces between words and is the continuation of 

word processing.

• Fifth, error correction fixes any errors introduced during storage by averaging the 

data anomalies.

• Sixth, record modulation is the storage of digital data not as 1s and 0s but rather as 

pulses of magnetic energy.  Hence, pulse code modulation.

• Seventh, the data can be stored on many types of medium such as : CDs, DAT, 

tape, and as audio files on a computer.


Illustration 3.17  PCM Process





Digital processes are identified on all recordings as ADD, AAD, or DDD.  The ADD indicates that the original source was analog, then digitally recorded and finally digitally mastered.  AAD indicates analog source that was analog recorded but digitally mastered.  And finally, DDD indicates a digital source, (synth), recorded digitally, (computer), and then mastered digitally through a rendering process in audio software.  AAA is a totally engineered analog recording (record) while ADA is also an analog recording with analog mastering.


Exercise 3.3

Using audacity, record a sound using a 16-bit rate resolution and then record the same sound using a 32-bit rate resolution.  Then zoom in to see the word structures of both samples.  And for discussion, can you hear the difference?


Additive Synthesis

When two or more waveforms are combined, they create a multi component resulting in a complex wave form depending on the types of waves used and their frequencies.  This is referred to as additive synthesis.  The base wave is called the carrier wave while the added wave is called the modulating wave.  When combined they create a very different output signal.  


FM Synthesis

When two or more waves at very different frequencies are combined it is referred to frequency modulation.


It is important to note that when combining two or more audio waves the overall amplitude is increased.  This can cause digital distortions known as digital artifacts to occur.  Therefore, it is best to start at 75% of the carrier and subsequent modulating waves amplitudes to avoid this phenomenon.


*Click play on the icons next to the following images to hear the accompanying examples.


Illustration 3.18.1  Two sinusoidal waves before being combined.


Duncan Metcalfe


Illustration 3.18.2  The same two sinusoidal waves after being combined.

Duncan Metcalfe

AM Synthesis

When two or more wave forms with different amplitudes (loudness) are combined it is referred to as amplitude modulation.


Illustration 3.19  Two sinusoidal waves with different amplitudes combined.




Duncan Metcalfe


One can also combine both FM and AM additive synthesis to generate complex wave forms and interesting timbres.



 Illustration 3.20  Three waves of varying types and amplitudes. From lowest to highest amplitude–a 440Hz sawtooth wave, a 1650Hz sinusoidal wave, and a 2kHz square wave.








Duncan Metcalfe


Subtractive Synthesis

Subtractive Synthesis is the process of adding filters to a sound chain that affects the harmonics and sometimes the fundamental, by using filters such as equalizers, low-high-notch filters, and envelope controls.  By using this a subtractive synthesis chain one can generate infinite possibilities of sonic sounds and effects.  Ring Modulation-generates oscillation–using both additive and subtractive synthesis.


Noise Generators

Noise generators create a continuum of FQs that are equally distributed over the whole hearing spectrum.  There are three variants of noise.  They are white, pink and brown (also referred to as brownian).  The FQs are produced in equal intensity, however white noise sounds much brighter than pink or brown.  This is because pink noise boosts lower FQs while brown noise boosts lower FQs even more than pink noise.


Exercise 3.4

Using audacity generate three noise types on different tracks.  Make A-B listening comparisons and then combine into one track.  Describe the resulting noise effect.

one sine wave cycle .png
woads transmitted in series
sample rates.jpg
PCM process.png
2 sine waves very diff frequencies.png
fm synthesis 2 sine waves combined 1.png
bottom of page