Table of contents:

  What is sound?
What's the difference between digital and analogue?
Scales, keys and stuff
Delay
What is a filter, then?
What are dynamics and compression?
Standard effects
Creating a sonic environment
DSP effects
Part One - Sound Tutorial by Tom Molesworth

If you would like to go to the first part, please, press here!

What is sound? Go to top of page

Sound is the term used to describe vibrations within the range of human hearing. It travels through objects (the air, walls, human body) as a series of compressions and expansions of the material, and as such will travel much faster through dense objects than it will through the air - sound travels faster in water, and only about 340 metres per second in air (the "sound barrier" or Mach 1).

The vibrations can be represented on a graph of displacement against time - time is traditionally on the X axis (horizontal), while the displacement from zero is on the vertical axis (since displacement = power, an audio signal can be graphed as voltage against time) A simple tone at a constant pitch - someone whistling a continuous note, or holding a note on a pan flute, for example - is represented by a smooth curve moving from the centre, upwards, then turns back to zero, continuing down, and finally back up to zero. This is called a sine wave - and is a very important component in sound or indeed any other signal.

A sine

wave

The sine wave is cyclical - the sequence repeats perfectly as you move along in time. The length of one repetition (in seconds) is called the period of the wave, and the number of repetitions of the wave you can have in one second is called the frequency. Frequency in this case follows the dictionary definition - how often something happens... and it's measured in occurrences ("cycles") per second, or Hertz (Hz). Frequency is the reciprocal of period - frequency = 1/period. How high the wave reaches on the graph is called amplitude, and represents the strength of the signal (which in the case of sound, we perceive as volume - more on volume in a future tutorial).

This sine wave, by the way, is normally produced by an "oscillator" - which is something that generates a waveform at a certain frequency... it can usually be any waveform, not just a sine wave.

As with most units in the technical/scientific world, shorthand notation has been introduced for very large or very small frequencies:

Hz kHz MHz GHz
1 1,000 1,000,000 1,000,000,000

Human hearing ranges from about 20 Hz to 20,000 Hz (or 20 kHz), most people cannot hear sounds outside of this range although to some extent we may be able to sense them - since all sounds are vibrations, they can be felt as well as heard (but much more energy is required before our sense of feel perceives a wave). Vibrations below that range are referred to as infrasonic, vibrations above that range are called supersonic. Most digital sounds limit themselves to the audible range, analogue equipment has a limited bandwidth as well, but often exceeds the range of human hearing.

So... a sine wave is a tone a bit like a whistle or a flute - soft (as opposed to "harsh" or "strident"), and only composed of one pitch. You can fade the volume (amplitude) of this sine wave up and down, and it will still be the same pitch, by the way. What about more interesting sounds? Essentially, different sounds (of the same pitch as our sine wave) are shaped differently from our smooth up-and-down sine curve.

A typical "note" is usually a short burst of sound at a particular pitch. There may be many other frequencies in the sound but the pitch of the note will usually be dominant. The non-dominant parts of the sound are called the harmonics - while they aren't the foundation of the sound, they are important in giving it its sonic characteristics. It should be noted that a perfect sine wave contains no harmonics, it is only a fundamental "pure" note. Other waveforms, such as square, saw, etc contain various mixtures of fundamental and harmonic components. More complicated sounds - such as the human voice - are composed of many different waves, each of a different frequency, and of different volume. You can get "unpitched" sounds, too, such as percussion.

Square

wave Saw tooth Triangle Noise

In simplified terms, the fundamental is the frequency of the waveform. Any waveform other than a perfect sine will produce partials. Harmonics are musically related partials that are a ratio of the fundamental. A note is a sound having a single pitch. If the listener doesn't perceive a single pitch, it isn't a note. With acoustic instruments, the perceived pitch and fundamental are almost always the same, however with electronic instruments, a note may have a perceived pitch different from the fundamental or be dominated by non-harmonic partials. If this dominant frequency is related to the fundamental which lies below it, the fundamental may then be considered a "sub-harmonic".

Before we get ahead of ourselves, let's sort out this frequency business. Frequencies are all over the place in music and they are central to the whole concept of "sound" as opposed to formless "vibration".

The lower the frequency (in other words, the less often it cycles per second), the lower the pitch of the note. Deep bass sounds are in the low-frequency range, the whine you hear when a TV set is switched on is a high-frequency sound (between 15 and 16 kHz), and a big crashing snare usually has elements right the way across the spectrum. Here are few other common ranges:

0 Hz -> 30 Hz = Biofeedback range. At high volumes can induce hallucinations, cause all sorts of nasty things to happen to the digestive system, even stop you breathing... you'll be wanting this at max for psy-trance!

30 Hz -> 60 Hz = Sub-bass. Mains hum is at 50 Hz or 60 Hz - this is the frequency of the mains power supply in most countries, and if you can hear this when nothing is supposed to be playing then you're either about to be visited by fractal beings from last Tuesday or there's an earthing problem with your equipment. Look up "mains hum", "ground hum" and "earthing" in the manual or the internet (you can try here or here).

60 Hz -> 800 Hz = Bass. From here up to 800 Hz is audible bass on most speakers and headphones.

0.2 kHz -> 1.4 kHz = Main part of the human voice - this is just about the bare minimum for recognisable vocal samples.
-> 3kHz Male voice (low quality level)
-> 6kHz Female voice (low quality level).

800 Hz -> 5-9 kHz = Mid-range (or Mids) - human hearing is more sensitive in this range, so it should be slightly lower in volume (down to about -6dB or so) than the bass and high end. Years of human speech (in all its various incarnations) as an evolutionary asset must've led to increased numbers of neurons/ganglia/whatever they are sensitive to this particular frequency range. It's more likely that aliens landed one day and their spacecraft was playing tunes so intense, focused just in this area of the sound spectrum, that they left a permanent impact and people have been extra-sensitive to those sounds ever since, in the hope of one day hearing the magic again.

7kHz -> 22kHz = High-end - although it might appear to be a wide range, but due to the logarithmic scale, doubling the frequency increases the note by a fixed amount. So a step of one octave is on the order of 100 Hz in the bass, but 10 kHz at the high end. This is the high end, by the way - up to whatever the top end of your hearing is.

We'll just take a quick detour here - it's time to mention that there are two different methods for dealing with sounds, and they are somewhat different in terms of features...

What's the difference between digital and analogue? Go to top of page

The digital world has finite limits and definite boundaries. Something that is "digital" operates from a finite set of data - "analogue", on the other hand, is the term used to describe something that is infinitely variable.

A simple example of digital is the whole numbers from 1 to 10. You can list them in order, and you can store an exact representation of the number. Analogue represents *all* the numbers from 1 to 10 - fractions, whole numbers, recurring decimals, irrational numbers, *anything* that fits between 1 and 10... and you guessed it, there are an infinite amount of them (you can always put another digit on the end - from 0.01 you can go to 0.012, 0.0123, etc). If you can list all the possible cases, then you are dealing with something digital. If there's no possible way of doing this, then it's analogue. So, digital uses discrete values to represent data, while analogue uses continuous values.

So analogue provides us a limitless playing field, but digital will only go so far and no further. Why do we bother with digital, then? Musicians should be at home with infinity, surely? The answer lies in how you can process the data. Analogue information in the audio world is (most of the time) represented by electrical signals. Instead of numbers from 1 to 10, you have an analogue voltage between -5v and +5v (for example). Again, this range contains an infinity of subdivisions. Using analogue circuitry, you can do all sorts of things to this - but since we cannot precisely define any number (you'd have to have equipment with infinite precision which doesn't exist at the moment), we can't store perfect digital data. Sooner or later, by using discrete values you'll have to round off - i.e. approximate.

This isn't strictly true that one can't record perfect data using analogue, suffice to say that it's more trouble and expense than it's worth. So digital comes into its own when we're not just doing everything on the fly, but want to refer to what happened previously - and the classic example here is an echo (see delay section, below).

Digital has limitations, too - you have to break the sound down into a series of snapshots, so instead of a continuous wave representing the movements of the speaker cone, we have a series of numbers which define positions at exact moments in time. The sound card or CD player will normally interpolate (fill in) between these positions to move the cone smoothly rather than in steps.

This process of taking "snapshots" is called sampling, and is (almost always) done at a fixed rate. The rate at which one should sample in order to ensure acurate recording depends on the highest frequency in the analogue original. You can't get a sound containing every frequency with digital, unfortunately: the highest frequency that can be recorded digitally is half the rate (or frequency) at which you sample, and this rate is named after a Mr Nyquist. This "Nyquist Frequency" works on the basis that the simplest representation of a sine wave (up then down and back up) is one snapshot at the top, one at the bottom - two samples. CD runs at 44.1kHz - giving sounds up to 22.05kHz (which as we remember is above the upper threshold for human hearing). MP3 is usually compressed to yield about 16kHz range. DAT can go up to 96kHz, although the standard is 48 kHz. Any attempts at recording anything above the Nyquist frequency will yield an ugly form of distortion called aliasing, and must be avoided. Typically, digital recorders will filter out any frequencies above this before recording the signal, ensuring that only what can be represented with some degree of accuracy is recorded.

There's another number involved with sampling - the sample size, in bits. The sample rate (or sampling frequency) is how often you take a snapshot, while the sample size defines how much information you pack into each one of those snapshots. The higher the sample size, the more precise each snapshot will be. This information is stored as ones and zeroes (binary) and the precision of these snapshots depends on how many bits are used to express each snapshot - so precision and dynamic range are dependent on the bit depth. To cut short the extensive tutorial on binary arithmetic involved in this, let's just say that you'll typically find either 8 bits (giving 256 separate positions), 16 bits (65536), or 24 bits (16777216) sample sizes. The formula is "2 to the power of N" - i.e. multiply 2 by itself N times. CD is 16-bit. Soundcards are usually 16-bit or 24-bit, sometimes 8-bit. MP3's varies wildly.

Digital recording involves an important trade-off. The higher the sampling frequency and the higher the bit rate (sample size) you want to use, the more precise the representation of the recorded signal you'll achieve - at the expense of needing to record more data. For a given length of time, this is why those MP3s over 10 MB sound better than ones that are 5 MB - the larger files contain more information about the original sound. Unfortunately that better sound takes longer to download and occupies more disk space...

Why does analogue have a better reputation? At the moment, analogue equipment generally delivers what is perceived as a "warmer", richer sound than digital. The oscillators often vary in pitch by a fractional amount, and since there is only a limited amount of processing power in a digital system, it can't always keep up with the extra calculations required to simulate the analogue effect.

The terms "pitch" and "frequency" are largely interchangeable, by the way. "pitch" usually implies a musical note whereas frequency is less likely to be tied to our twelve-tone scale. Twelve-tone scale, you ask?

Scales, keys and stuff G
o to top of page

A huge chunk of musical theory (and I'm talking Western theory here, although of course it's only a drop in the auditory ocean) is concerned with notes. And well it should be - up until very recently, instruments didn't have a huge amount of flexibility. A piano has no cutoff control, a harpsichord doesn't even provide for dynamics (changes in volume). So, the composers of the classical times had to turn to melody and rhythm to make music, and work with the sonic textures available - which led to such great artists as Bach, and... well, Bach in particular, but also Beethoven, Chopin, Tchaikovsky, Mozart, in fact there's a lot of them so I'll stop there and direct you to the internet or your nearest music library to find out more. Anyway, these people had to develop the talent of showing beauty from instruments of imperfection, and we could learn a few things from them.

So, "classical" music was mainly note-based. What's a note, then? A note is simply a letter representing a fixed frequency, or a key on a piano or synthesizer (or whatever!). Following the cyclical nature of music, the frequency spectrum is divided into repeating sets of notes called "octaves" - because there are eight "natural" notes (the white notes on a keyboard) in each cycle. There are"accidental" notes as well (the black notes on a keyboard). So, if you start on a white key on a keyboard, and count up eight adjecent white keys, you are going up one octave.

The relationship between notes and frequency is quite simple - if you go up an octave, you are doubling the frequency. Middle A, for example (that's the note A in the fourth octave, written as A4) is defined as 440 Hz. That's a useful one to remember - we'll get to the "why" later. If you go up an octave, to A5, you're now at 880 Hz. This is called a logarithmic scale - there's a lot of logarithmic stuff in music, so if you're unfamiliar with the concept then it might help to learn a bit more about it.

So we've got twelve notes, one of which is A and happens to be at 440 Hz when it's in the middle (fourth) octave. What about the rest? As we mentioned, there are seven naturals , and another five accidentals. An accidental is represented by the letter of the closest tone and either a "sharp" (#) or "flat" (b) to indicate that it has been raised or lowered (respectively). G#, for example, is somewhere above G and below A, and is the same as Ab. The accidentals are the black notes on a keyboard as mentioned before... but only if you start on C. Read on:

Read this first column of this table from top to bottom to go through the 12 tones of the Western scale - each box down represents one halftone. Note that an octave is made up of 12 halftones (also called semitones). The second column is an alternate naming system for the notes (remember the Sound of Music's "Do, a deer, a female deer; Re, a drop of golden sun..."? Yup, you guessed it - Julian Andrews was teaching us the scale! :-)
A la
A# or Bb  
B ti or si
C do
C# or Db  
D re
D# or Eb  
E mi
F fa
F# or Gb  
G sol
G# or Ab  

Note that by definition, B and C are only separated by a semitone, same with E to F. When you sing the scale "do re mi fa sol la ti do", you are singing the major scale. The spacing between notes in a major scale always follows the same pattern:

2 full tones - one semitone - 3 full tones - one semitone.
Starting from C, this spacing is achieved naturally by going from one letter to the next ("C D E F G A B C"). But the scale doesn't have to start on C - you can start anywhere. But start on any other note, and you'll need to use sharps and flats to get this 2-1-3-1 spacing. Start on G, for example, and you get "G A B C D E F# G". Start from Eb, and you get "Eb F G Ab Bb C D Eb". As if all this wasn't confusing enough all ready, there are other kinds of scales that can be drawn up using our 12 tones that don't follow the major scale's 2-1-3-1 layout, the minor scale being the most well-known. But we'll skip that for now - lucky you.

So, back to frequency. How scientific do you want to get at this point? If you're likely to be scared by mathematical formulae, then look away *now*... In order to calculate the frequency of a note, use the following scientific giberish :-)

Frequency, in Hertz = 440 * 2^[(octave-4) + (note/12)]

where "note" starts at A=0, A#=1, B=2, C=3, etc. For example, C6 => 440 * 2^[(6-4) + 3/12)] = 2093 Hz, or almost 2.1 kHz.

Okay, yea who feareth math, you can look back again. You didn't miss much.

Notes are only part of the picture, the real trick to making a track work is to give it life. Life is change, and in music "change" is largely determined by the rhythm. Rhythm is the pattern of changes as the track or song progresses - from something as simple as a psytrance's on-the-beat kick drum, up to cascading polyrhythms and the total randomness of white noise.

The first part of the rhythm of a track is the "time signature". This takes the form of two numbers, normally placed one above the other in musical notation but for now we'll write them like this: 4/4, 3/4, 6/8 etc.

The first number represents the number of beats in a bar. 99% of psy-trance uses four beats: bom -ts- bom -ts- bom -ts- bom -ts-, which is why tracks seem to be split into groups of 4/8/16/32 beats. A waltz runs at 3 beats per bar: bom-cha-cha bom-cha-cha etc. which gives a bit of a limping feel when played back-to-back with psytrance but can be used to surprising effect in a track (rules? What rules? Ain't gonna play by no stinkin' *rulez*...). You can double/halve these numbers and obtain the same result - two beats instead of four, six beats instead of three, and so on. There's even some tracks with 5 beats in a bar (Logic Bomb?). Dance music is predominantly based on the four-beat bar, however - even twostep/jungle/drum&bass, which split the bar in half for that jerky swaying effect.

The second number is the unit in which the beats are measured - this is largely irrelevant for dance music unless you're entering stuff in musical notation, but is worth understanding. Normally, it's 4 - a quarter note. That's why the time signature looks suspiciously like a fraction - it *is* a fraction in essence, the top number multiplied by (1/something). Fairly often, the number at the bottom (the unit or denominator) is a multiple of two: 2, 4, 8, etc. (although a classical exception to this would be that above mentioned waltz - its demonimator would be a 3 or a 6). Theoretically you should be able to have *any* whole number here, but it's not very common to see things like 8/5 or 2/19 for example.

Time signatures are usually something like 4/4, 3/4, 6/8, 12/8, 5/4, 7/8 etc. Of course, time is change and there's nothing stopping you from switching from 4/4 to 5/4 and across to 7/8 if you feel like it - in fact, if you can bring it off, a shifting time signature can be very psychedelic (or just a total floor-clearing headfuck!)

The original Goa-trance beat is usually based on this pattern:
Beat: 1 _ _ _ 2 _ _ _ 3 _ _ _ 4 _ _ _
          SN               SN      
  K   TS   K   TS   K   TS   K   TS  

(one kick per beat) K = kick (bassdrum), SN= snare, TS=hihat

Two-step (jungle, etc) looks more like:
Beat: 1 _ _ _ 2 _ _ _ 3 _ _ _ 4 _ _ _
          SN               SN      
  K                   K          

(kick every other beat, the second kick is delayed by a half-beat)

More interesting, "natural" rhythms can be achieved by shuffling the notes slightly - pulling the hihats forward, ahead of the beat slightly (by playing them a few milliseconds before the beat), or shifting the snare drum slightly so that it causes different interference patterns with the kick (adjust the snare drum timing on a basic Goa-ish kick-kicksnare-kick-kicksnare pattern) are examples of this. Trance can take a surprising amount of variance in the kick timing as the track moves along - pushing the main kick back in preparation for a change in direction, etc - used carefully, it is helpful in reducing the "loop" effect a sampled percussion kit can have. Dynamics (loudness vs. softness) play a very important part here too - they have a strong effect on the direction of the beat.

Delay Go to top of page

Digital technology comes into its own with the delay effect. Echo, reverb, even filters can all be obtained with a delay function. A delay does just that - passes through a signal after a pause. Think of it as a time-shifting function - it can pull something from a timestream and reinsert it at a different point. Uh... you can think of it differently, if you want... :-)  Delay usually works by storing the incoming sound in a buffer, and reading back the sound from a different point in the buffer for the output.

A delay is perhaps the most useful component of any digital sound-manipulation system, and is of extreme importance in achieving many of the outer-space effects heard in trance. With some extra assistance, delay can act as

  • phaser
  • chorus
  • flanger
  • echo
  • filter
  • reverb
In fact, virtually every digital effect uses delay to some extent. To learn more about the above-mentioned effects, try looking them up here or here.

What is a filter, then? Go to top of page

Generally speaking, a "filter" takes a signal and changes it in some way - what goes in the filter is not the same as what comes out. In an audio, we normally (but not always!) mean a "frequency filter" - you pass in a sound, and receive a sound out which has altered the levels of some or all of the original frequencies. (In more technical terms, in analogue electronics a filter is circuit whose impedance is dependent on frequency. In electronic musical instruments, a filter is usually more complicated and contains several of such circuits, often controllable by external voltages. You can look at a simple pass filter as a frequency dependent voltage divider, that changes amplitudes based on frequency. A filter can other effects, including the phase of the signal. For example, analogue phase shifters use all-pass filters.) And how is all this useful to us? It offers an approach to modelling sound much like carving - you start with a block of sound, and chisel away parts of it to reveal a structure underneath. This approach is called "subtractive synthesis" (a good place for more info about this is "Analogue synthesis for beginners," with hand-on tutorial using free softsynths).

What are dynamics and compression? Go to top of page

Good questions - and here are a few very good answers: compression tutorial #1 compression tutorial #2 compression tutorial #3 compression tutorial #4


Copyright Tom Molesworth, 2001.
Converted from html by Plamen Todorov.
PT Design