What is sound?
the term used to describe vibrations within
the range of human hearing. It travels through
objects (the air, walls, human body) as a series
of compressions and expansions of the material,
and as such will travel much faster through
dense objects than it will through the air -
sound travels faster in water, and only about
340 metres per second in air (the "sound barrier"
or Mach 1).
The vibrations can be represented
on a graph of displacement against time - time
is traditionally on the X axis (horizontal),
while the displacement from zero is on the vertical
axis (since displacement = power, an audio signal
can be graphed as voltage against time)
A simple tone at a constant pitch - someone
whistling a continuous note, or holding a note
on a pan flute, for example - is represented
by a smooth curve moving from the centre, upwards,
then turns back to zero, continuing down, and
finally back up to zero. This is called a sine
wave - and is a very important component in
sound or indeed any other signal.
The sine wave is cyclical - the
sequence repeats perfectly as you move along
in time. The length of one repetition (in seconds)
is called the period of the wave, and the number
of repetitions of the wave you can have in one
second is called the frequency. Frequency in
this case follows the dictionary definition
- how often something happens... and it's measured
in occurrences ("cycles") per second, or Hertz
(Hz). Frequency is the reciprocal of period
- frequency = 1/period. How high the wave reaches
on the graph is called amplitude, and represents
the strength of the signal (which in the case
of sound, we perceive as volume - more on volume
in a future tutorial).
This sine wave, by the way, is
normally produced by an "oscillator" - which
is something that generates a waveform at a
certain frequency... it can usually be any waveform,
not just a sine wave.
As with most units in the technical/scientific
world, shorthand notation has been introduced
for very large or very small frequencies:
Human hearing ranges from about
20 Hz to 20,000 Hz (or 20 kHz), most people
cannot hear sounds outside of this range although
to some extent we may be able to sense them
- since all sounds are vibrations, they can
be felt as well as heard (but much more energy
is required before our sense of feel perceives
a wave). Vibrations below that range are referred
to as infrasonic, vibrations above that range
are called supersonic. Most digital sounds limit
themselves to the audible range, analogue equipment
has a limited bandwidth as well, but often exceeds
the range of human hearing.
So... a sine wave is a tone a
bit like a whistle or a flute - soft (as opposed
to "harsh" or "strident"), and only composed
of one pitch. You can fade the volume (amplitude)
of this sine wave up and down, and it will still
be the same pitch, by the way. What about more
interesting sounds? Essentially, different sounds
(of the same pitch as our sine wave) are shaped
differently from our smooth up-and-down sine
A typical "note" is usually a
short burst of sound at a particular pitch.
There may be many other frequencies in the sound
but the pitch of the note will usually be dominant.
The non-dominant parts of the sound are called
the harmonics - while they aren't the foundation
of the sound, they are important in giving it
its sonic characteristics. It should be noted
that a perfect sine wave contains no harmonics,
it is only a fundamental "pure" note. Other
waveforms, such as square, saw, etc contain
various mixtures of fundamental and harmonic
components. More complicated sounds - such as
the human voice - are composed of many different
waves, each of a different frequency, and of
different volume. You can get "unpitched" sounds,
too, such as percussion.
In simplified terms, the fundamental
is the frequency of the waveform. Any waveform
other than a perfect sine will produce partials.
Harmonics are musically related partials that
are a ratio of the fundamental. A note is a
sound having a single pitch. If the listener
doesn't perceive a single pitch, it isn't a
note. With acoustic instruments, the perceived
pitch and fundamental are almost always the
same, however with electronic instruments, a
note may have a perceived pitch different from
the fundamental or be dominated by non-harmonic
partials. If this dominant frequency is related
to the fundamental which lies below it, the
fundamental may then be considered a "sub-harmonic".
Before we get ahead of ourselves,
let's sort out this frequency business. Frequencies
are all over the place in music and they are
central to the whole concept of "sound" as opposed
to formless "vibration".
The lower the frequency (in other
words, the less often it cycles per second),
the lower the pitch of the note. Deep bass sounds
are in the low-frequency range, the whine you
hear when a TV set is switched on is a high-frequency
sound (between 15 and 16 kHz), and a big crashing
snare usually has elements right the way across
the spectrum. Here are few other common ranges:
0 Hz -> 30 Hz = Biofeedback
range. At high volumes can induce hallucinations,
cause all sorts of nasty things to happen
to the digestive system, even stop you breathing...
you'll be wanting this at max for psy-trance!
30 Hz -> 60 Hz = Sub-bass.
Mains hum is at 50 Hz or 60 Hz - this is the
frequency of the mains power supply in most
countries, and if you can hear this when nothing
is supposed to be playing then you're either
about to be visited by fractal beings from
last Tuesday or there's an earthing problem
with your equipment. Look up "mains hum",
"ground hum" and "earthing" in the manual
or the internet (you can try here or here).
60 Hz -> 800 Hz = Bass. From
here up to 800 Hz is audible bass on most
speakers and headphones.
0.2 kHz -> 1.4 kHz = Main
part of the human voice - this is just about
the bare minimum for recognisable vocal samples.
-> 3kHz Male voice (low quality level)
-> 6kHz Female voice (low quality level).
800 Hz -> 5-9 kHz = Mid-range
(or Mids) - human hearing is more sensitive
in this range, so it should be slightly lower
in volume (down to about -6dB or so) than
the bass and high end. Years of human speech
(in all its various incarnations) as an evolutionary
asset must've led to increased numbers of
neurons/ganglia/whatever they are sensitive
to this particular frequency range. It's more
likely that aliens landed one day and their
spacecraft was playing tunes so intense, focused
just in this area of the sound spectrum, that
they left a permanent impact and people have
been extra-sensitive to those sounds ever
since, in the hope of one day hearing the
7kHz -> 22kHz = High-end
- although it might appear to be a wide range,
but due to the logarithmic scale, doubling
the frequency increases the note by a fixed
amount. So a step of one octave is on the
order of 100 Hz in the bass, but 10 kHz at
the high end. This is the high end, by the
way - up to whatever the top end of your hearing
We'll just take a quick detour
here - it's time to mention that there are two
different methods for dealing with sounds, and
they are somewhat different in terms of features...
What's the difference between digital and analogue?
The digital world has finite limits
and definite boundaries. Something that is "digital"
operates from a finite set of data - "analogue",
on the other hand, is the term used to describe
something that is infinitely variable.
A simple example of digital is
the whole numbers from 1 to 10. You can list
them in order, and you can store an exact representation
of the number. Analogue represents *all* the
numbers from 1 to 10 - fractions, whole numbers,
recurring decimals, irrational numbers, *anything*
that fits between 1 and 10... and you guessed
it, there are an infinite amount of them (you
can always put another digit on the end - from
0.01 you can go to 0.012, 0.0123, etc). If you
can list all the possible cases, then you are
dealing with something digital. If there's no
possible way of doing this, then it's analogue.
So, digital uses discrete values to represent
data, while analogue uses continuous values.
So analogue provides us a limitless
playing field, but digital will only go so far
and no further. Why do we bother with digital,
then? Musicians should be at home with infinity,
surely? The answer lies in how you can process
the data. Analogue information in the audio
world is (most of the time) represented by electrical
signals. Instead of numbers from 1 to 10, you
have an analogue voltage between -5v and +5v
(for example). Again, this range contains an
infinity of subdivisions. Using analogue circuitry,
you can do all sorts of things to this - but
since we cannot precisely define any number
(you'd have to have equipment with infinite
precision which doesn't exist at the moment),
we can't store perfect digital data. Sooner
or later, by using discrete values you'll have
to round off - i.e. approximate.
This isn't strictly true that
one can't record perfect data using analogue,
suffice to say that it's more trouble and expense
than it's worth. So digital comes into its own
when we're not just doing everything on the
fly, but want to refer to what happened previously
- and the classic example here is an echo (see
delay section, below).
Digital has limitations, too -
you have to break the sound down into a series
of snapshots, so instead of a continuous wave
representing the movements of the speaker cone,
we have a series of numbers which define positions
at exact moments in time. The sound card or
CD player will normally interpolate (fill in)
between these positions to move the cone smoothly
rather than in steps.
This process of taking "snapshots"
is called sampling, and is (almost always) done
at a fixed rate. The rate at which one should
sample in order to ensure acurate recording
depends on the highest frequency in the analogue
original. You can't get a sound containing every
frequency with digital, unfortunately: the highest
frequency that can be recorded digitally is
half the rate (or frequency) at which you sample,
and this rate is named after a Mr Nyquist. This
"Nyquist Frequency" works on the basis that
the simplest representation of a sine wave (up
then down and back up) is one snapshot at the
top, one at the bottom - two samples. CD runs
at 44.1kHz - giving sounds up to 22.05kHz (which
as we remember is above the upper threshold
for human hearing). MP3 is usually compressed
to yield about 16kHz range. DAT can go up to
96kHz, although the standard is 48 kHz. Any
attempts at recording anything above the Nyquist
frequency will yield an ugly form of distortion
and must be avoided. Typically, digital recorders
will filter out any frequencies above this before
recording the signal, ensuring that only what
can be represented with some degree of accuracy
There's another number involved
with sampling - the sample size, in bits. The
sample rate (or sampling frequency) is how often
you take a snapshot, while the sample size defines
how much information you pack into each one
of those snapshots. The higher the sample size,
the more precise each snapshot will be. This
information is stored as ones and zeroes (binary)
and the precision of these snapshots depends
on how many bits are used to express each snapshot
- so precision and dynamic range are dependent
on the bit depth. To cut short the extensive
tutorial on binary arithmetic involved in this,
let's just say that you'll typically find either
8 bits (giving 256 separate positions), 16 bits
(65536), or 24 bits (16777216) sample sizes.
The formula is "2 to the power of N" - i.e.
multiply 2 by itself N times. CD is 16-bit.
Soundcards are usually 16-bit or 24-bit, sometimes
8-bit. MP3's varies wildly.
Digital recording involves an
important trade-off. The higher the sampling
frequency and the higher the bit rate (sample
size) you want to use, the more precise the
representation of the recorded signal you'll
achieve - at the expense of needing to record
more data. For a given length of time, this
is why those MP3s over 10 MB sound better than
ones that are 5 MB - the larger files contain
more information about the original sound. Unfortunately
that better sound takes longer to download and
occupies more disk space...
Why does analogue have a better
reputation? At the moment, analogue equipment
generally delivers what is perceived as a "warmer",
richer sound than digital. The oscillators often
vary in pitch by a fractional amount, and since
there is only a limited amount of processing
power in a digital system, it can't always keep
up with the extra calculations required to simulate
the analogue effect.
The terms "pitch" and "frequency"
are largely interchangeable, by the way. "pitch"
usually implies a musical note whereas frequency
is less likely to be tied to our twelve-tone
scale. Twelve-tone scale, you ask?
Scales, keys and stuff
A huge chunk of musical theory
(and I'm talking Western theory here, although
of course it's only a drop in the auditory ocean)
is concerned with notes. And well it should
be - up until very recently, instruments didn't
have a huge amount of flexibility. A piano has
no cutoff control, a harpsichord doesn't even
provide for dynamics (changes in volume). So,
the composers of the classical times had to
turn to melody and rhythm to make music, and
work with the sonic textures available - which
led to such great artists as Bach, and... well,
Bach in particular, but also Beethoven, Chopin,
Tchaikovsky, Mozart, in fact there's a lot of
them so I'll stop there and direct you to the
internet or your nearest music library to find
out more. Anyway, these people had to develop
the talent of showing beauty from instruments
of imperfection, and we could learn a few things
So, "classical" music was mainly
note-based. What's a note, then? A note is simply
a letter representing a fixed frequency, or
a key on a piano or synthesizer (or whatever!).
Following the cyclical nature of music, the
frequency spectrum is divided into repeating
sets of notes called "octaves" - because there
are eight "natural" notes (the white notes on
a keyboard) in each cycle. There are"accidental"
notes as well (the black notes on a keyboard).
So, if you start on a white key on a keyboard,
and count up eight adjecent white keys, you
are going up one octave.
The relationship between notes
and frequency is quite simple - if you go up
an octave, you are doubling the frequency. Middle
A, for example (that's the note A in the fourth
octave, written as A4) is defined as 440 Hz.
That's a useful one to remember - we'll get
to the "why" later. If you go up an octave,
to A5, you're now at 880 Hz. This is called
a logarithmic scale - there's a lot of logarithmic
stuff in music, so if you're unfamiliar with
the concept then it might help to learn a bit
more about it.
So we've got twelve notes, one
of which is A and happens to be at 440 Hz when
it's in the middle (fourth) octave. What about
the rest? As we mentioned, there are seven naturals
, and another five accidentals. An accidental
is represented by the letter of the closest
tone and either a "sharp" (#) or "flat" (b)
to indicate that it has been raised or lowered
(respectively). G#, for example, is somewhere
above G and below A, and is the same as Ab.
The accidentals are the black notes on a keyboard
as mentioned before... but only if you start
on C. Read on:
Read this first column of this
table from top to bottom to go through the 12
tones of the Western scale - each box down represents
one halftone. Note that an octave is made up
of 12 halftones (also called semitones). The
second column is an alternate naming system
for the notes (remember the Sound of Music's
"Do, a deer, a female deer; Re, a drop of golden
sun..."? Yup, you guessed it - Julian Andrews
was teaching us the scale! :-)
|A# or Bb
||ti or si
|C# or Db
|D# or Eb
|F# or Gb
|G# or Ab
Note that by definition, B and
C are only separated by a semitone, same with
E to F. When you sing the scale "do re mi fa
sol la ti do", you are singing the major scale.
The spacing between notes in a major scale always
follows the same pattern:
2 full tones -
one semitone - 3 full tones - one semitone.
Starting from C, this spacing
is achieved naturally by going from one letter
to the next ("C D E F G A B C"). But the scale
doesn't have to start on C - you can start anywhere.
But start on any other note, and you'll need to
use sharps and flats to get this 2-1-3-1
spacing. Start on G, for example, and you get
"G A B C D E F# G". Start from Eb, and
you get "Eb F G Ab Bb C D
Eb". As if all this wasn't confusing enough
all ready, there are other kinds of scales that
can be drawn up using our 12 tones that don't
follow the major scale's 2-1-3-1
layout, the minor scale being the most well-known.
But we'll skip that for now - lucky you.
So, back to frequency. How scientific
do you want to get at this point? If you're
likely to be scared by mathematical formulae,
then look away *now*... In order to calculate
the frequency of a note, use the following scientific
Frequency, in Hertz = 440 *
2^[(octave-4) + (note/12)]
where "note" starts at A=0, A#=1,
B=2, C=3, etc. For example, C6 => 440 * 2^[(6-4)
+ 3/12)] = 2093 Hz, or almost 2.1 kHz.
Okay, yea who feareth math, you
can look back again. You didn't miss much.
Notes are only part of the picture, the real
trick to making a track work is to give it life.
Life is change, and in music "change" is largely
determined by the rhythm. Rhythm is the pattern
of changes as the track or song progresses -
from something as simple as a psytrance's on-the-beat
kick drum, up to cascading polyrhythms and the
total randomness of white noise.
The first part of the rhythm of
a track is the "time signature". This takes
the form of two numbers, normally placed one
above the other in musical notation but for
now we'll write them like this: 4/4, 3/4, 6/8
The first number represents the
number of beats in a bar. 99% of psy-trance
uses four beats: bom -ts- bom -ts- bom -ts-
bom -ts-, which is why tracks seem to be
split into groups of 4/8/16/32 beats. A waltz
runs at 3 beats per bar: bom-cha-cha bom-cha-cha
etc. which gives a bit of a limping feel when
played back-to-back with psytrance but can be
used to surprising effect in a track (rules?
What rules? Ain't gonna play by no stinkin'
*rulez*...). You can double/halve these numbers
and obtain the same result - two beats instead
of four, six beats instead of three, and so
on. There's even some tracks with 5 beats in
a bar (Logic Bomb?). Dance music is predominantly
based on the four-beat bar, however - even twostep/jungle/drum&bass,
which split the bar in half for that jerky swaying
The second number is the unit
in which the beats are measured - this is largely
irrelevant for dance music unless you're entering
stuff in musical notation, but is worth understanding.
Normally, it's 4 - a quarter note. That's why
the time signature looks suspiciously like a
fraction - it *is* a fraction in essence, the
top number multiplied by (1/something). Fairly
often, the number at the bottom (the unit or
denominator) is a multiple of two: 2, 4, 8,
etc. (although a classical exception to this
would be that above mentioned waltz - its demonimator
would be a 3 or a 6). Theoretically you should
be able to have *any* whole number here, but
it's not very common to see things like 8/5
or 2/19 for example.
Time signatures are usually something
like 4/4, 3/4, 6/8, 12/8, 5/4, 7/8 etc. Of course,
time is change and there's nothing stopping
you from switching from 4/4 to 5/4 and across
to 7/8 if you feel like it - in fact, if you
can bring it off, a shifting time signature
can be very psychedelic (or just a total floor-clearing
The original Goa-trance beat is
usually based on this pattern:
(one kick per beat) K = kick (bassdrum),
SN= snare, TS=hihat
Two-step (jungle, etc) looks more
(kick every other beat, the second
kick is delayed by a half-beat)
More interesting, "natural" rhythms
can be achieved by shuffling the notes slightly
- pulling the hihats forward, ahead of the beat
slightly (by playing them a few milliseconds
before the beat), or shifting the snare drum
slightly so that it causes different interference
patterns with the kick (adjust the snare drum
timing on a basic Goa-ish kick-kicksnare-kick-kicksnare
pattern) are examples of this. Trance can take
a surprising amount of variance in the kick
timing as the track moves along - pushing the
main kick back in preparation for a change in
direction, etc - used carefully, it is helpful
in reducing the "loop" effect a sampled percussion
kit can have. Dynamics (loudness vs. softness)
play a very important part here too - they have
a strong effect on the direction of the beat.
Digital technology comes into
its own with the delay effect. Echo, reverb, even
filters can all be obtained with a delay function.
A delay does just that - passes through a signal
after a pause. Think of it as a time-shifting
function - it can pull something from a timestream
and reinsert it at a different point. Uh... you
can think of it differently, if you want... :-)
Delay usually works by storing the incoming
sound in a buffer, and reading back the sound
from a different point in the buffer for the output.
A delay is perhaps the most useful
component of any digital sound-manipulation
system, and is of extreme importance in achieving
many of the outer-space effects heard in trance.
With some extra assistance, delay can act as
In fact, virtually every digital
effect uses delay to some extent. To learn
more about the above-mentioned effects, try looking
them up here or here.
What is a
Generally speaking, a "filter"
takes a signal and changes it in some way -
what goes in the filter is not the same as what
comes out. In an audio, we normally (but not
always!) mean a "frequency filter" - you pass
in a sound, and receive a sound out which has
altered the levels of some or all of the original
frequencies. (In more technical terms, in analogue
electronics a filter is circuit whose impedance
is dependent on frequency. In electronic musical
instruments, a filter is usually more complicated
and contains several of such circuits, often
controllable by external voltages. You can look
at a simple pass filter as a frequency dependent
voltage divider, that changes amplitudes based
on frequency. A filter can other effects, including
the phase of the signal. For example, analogue
phase shifters use all-pass filters.) And how
is all this useful to us? It offers an approach
to modelling sound much like carving - you start
with a block of sound, and chisel away parts
of it to reveal a structure underneath. This
approach is called "subtractive synthesis" (a
good place for more info about this is "Analogue
synthesis for beginners," with hand-on tutorial
using free softsynths).
What are dynamics and compression?
Good questions - and here are
a few very good answers: compression
tutorial #1 compression
tutorial #2 compression tutorial
© Tom Molesworth, 2001.
Converted from html by Plamen Todorov.