This article gives an introduction to the scientific aspects of sound. For information on related topics see Acoustics (for matters connected with rooms, instruments and the human voice), Hearing and psychoacoustics, Psychology of music and Recorded sound; for the history of the study of sound, see Physics of music.
3. Visual representation of sound.
4. Human response and physical measurement.
5. Means of producing musical tones.
6. Origins of quality and tonal differences.
7. The physics of tubes and horns.
8. Methods of analysis and study.
9. Tones in sequence and combination.
10. The effect of acoustic environment.
CHARLES TAYLOR (with MURRAY CAMPBELL)
Greek and Roman sources include numerous references to scientific reflections on the nature and origin of sound, and these seem to be the earliest recorded thoughts indicating any attitude to music other than the purely aesthetic. Many classical observers, however, followed the Aristotelian method of thinking about an experiment and imagining the results, a method which, though of undoubted value as a starting-point, usually led to conflicting conclusions if not checked against real experiments. Also, a great deal of mysticism, especially concerning numerical relationships, tended to obscure more scientific ideas.
There followed a gap of 15–16 centuries during which there was no development in the scientific study of sound. But during the 16th and 17th centuries almost all of the great scientists of the time devoted at least some of their attention to the subject. Galileo made the first serious study of vibrating strings and gave a plausible explanation of the origin of consonance and dissonance, one that remains generally acceptable. He also introduced the idea of demonstration by analogue, including the use of pendula to demonstrate harmonic ratios. Boyle performed the classical experiment to show that a medium is needed for sound transmission; Descartes made studies of resonance; Hooke recognized that a sound of definite pitch can be derived from a rotating wheel; Mersenne formulated laws of vibrating strings (though Galileo had laid firm foundations in unpublished work); and Newton was the first to make a theoretical derivation of the velocity of sound and to compare it with experimental results.
In the 18th and 19th centuries discoveries came rapidly. Young made full studies of the modes of vibration of strings; Chladni studied vibrations of plates; Fourier established the mathematical theories on which all modern wave analysis is based; Wheatstone developed methods of making sound waves visible; Faraday investigated singing flames; the equal-tempered scale appeared; Koenig studied the human ear’s pitch range; and Helmholtz gathered all the studies together in a magnificent volume. Bell produced the telephone and Edison the phonograph; John Tyndall lectured in Britain and the USA, using demonstrations that still have great impact and for which much of the apparatus remains at the Royal Institution in London. During the first half of the 20th century there was a decline in progress, partly because scientists were preoccupied with atomic physics. In the second half, new technological advances, largely deriving from these studies (particularly those concerned with electronic measuring devices), gave the study of sound a new lease of life. (For further material on the history of the science of sound see Miller).
One of the earliest applications of the air pump was to show that sound cannot be heard from a source in an evacuated vessel: intervening air is necessary for transmission. But the air does not have to travel; sound can pass through walls and windows. The idea emerges, then, of transmission by means of waves, that is by transfer of energy from point to point without permanent change in the medium. Sound waves involve tiny disturbances or changes in the pressure of the air. The amount of the disturbance is small; a quiet musical instrument might create changes in the atmospheric pressure of only about one part in a million. Each disturbance, which may be an increase or decrease in pressure but is usually a complicated succession of both, then travels out through the surrounding air creating spherical wave surfaces round the origin of the sound – a three-dimensional counterpart of the circular ripples produced by a pebble striking the surface of a pond. The waves in the air travel outwards at approximately 340 metres (m) per second.
Because the energy associated with a particular sound is spread out over the surface of a sphere, it follows that the fraction of the total energy that falls on a human ear reduces as the square of the distance from the source. Assuming that the area of sound-wave surface picked up by an ear is 12·5 cm2, if the listener is 1 m away from the source, the surface area of the sphere is then just over 125,000 cm2 and so only about one ten-thousandth of the energy is received by one ear; at 5 m the proportion would be one quarter-millionth. In this calculation it is, of course, assumed that the source is far from any objects that would reflect or diffract the sound – in other words that it is in empty space (except for air). Usually there is an environment, even if it is only the ground, and in a room the whole wave pattern is different. The effect of room acoustics is discussed in §10 below and in Acoustics, §I.
Sound, then, arises and is transmitted as tiny pressure changes in the air. When any two hard objects collide they produce a sound that might be described as a click or a crash depending on its loudness. The simplest click corresponds to a sudden rise in the pressure of the air, produced by the air that was between the colliding objects being forcibly squeezed out. The pressure then reduces, usually overshoots the mark and after a few oscillations falls to normal. Clicks may be combined in two ways. If they follow each other in a random fashion, as for example when an audience applauds, the resulting sound is described as ‘noise’. It may be continuous and of uniform loudness, but cannot easily be assigned a pitch. However, if the clicks follow each other regularly they are heard separately if well spaced in time (e.g. the ticks of a clock), but if they are speeded up they begin to produce a sound of definite musical pitch. The most obvious example is the circular saw in which the teeth successively strike the wood: as the speed of rotation rises, so does the pitch of the sound. Any regularly repeated sequence of pressure changes will give rise to the sensation of musical tones of constant pitch if the sequence repeats at a frequency between 18 and 15,000 times a second approximately; the exact limits depend on individual variations in hearing, and especially on the age of the listener (see §4 below).
What has been said concerns steady, unchanging sounds; complications arise in the case of varying sounds. Also, it is the regularity of repetition that gives a sound the musical sensation of pitch; the repeating unit does not matter. For example, fig.1 shows the pressure variations (i.e. plots of amplitude against time) in four quite different sorts of wave; all four would give rise to a steady sensation of the same pitch, but the quality of the sound, or timbre, would be quite different in each case. Fig.1d is a sine wave (so called because the mathematical equation from which it is derived is y = a sinπx); a treble recorder playing a note steadily and fairly quietly with no trace of vibrato gives a close approximation to a sine-wave tone. It is important scientifically for two reasons. First, sine-wave oscillation occurs naturally in a large number of systems that are normally balanced in equilibrium and are then slightly displaced. A child’s swing, the pendulum or balance wheel of a clock, the air in a bottle when one blows across its neck and the metal reed of a mouth organ are all examples. Second, any wave, no matter how complicated, can be represented by adding up the effects of a large number of sine waves. This is the basis of Fourier analysis and synthesis (see §8 below).
The question arises whether transmission through the air leaves sound waves unchanged. Clearly, if the waves are being created inside a room there are effects (see §10 below); and changes may occur in sounds transmitted through electronic systems (radio, telephone, recording). Here discussion is limited to some of the important effects that can arise in the process of transmission through the air. First, the speed of sound varies with the temperature, humidity and pressure of the air, and with its exact composition (though this last factor is unlikely to vary significantly except in highly artificial conditions, such as those inside a spacecraft or diving bell). But uniform changes in velocity of the magnitudes likely to arise in nature can be detected only by precise measurement, though non-uniform changes may produce quite noticeable effects: the waves may travel along a curved or bent path, that is, they may be ‘refracted’. For example, the velocity of sound is greater at higher temperatures. Suppose one listens to sounds in the open air near noon on a hot summer day. The earth will have heated up and the layers of air next to it will be correspondingly warm; higher up the air will be much cooler. A sound wave travelling towards an observer will thus tend to travel more slowly some distance above the earth and more quickly nearer to the ground, so the whole wave slews round and goes up into the air. Sound cannot therefore be heard at great distances, and this contributes to the muffled and drowsy effect at midday in summer, so often described by poets. On a clear night, however, the earth cools rapidly, the blanket of air remains relatively warm and the effect is reversed: sound waves tend to curve down towards the earth and hence ‘carry’ much further. Similar effects occur over water, and a combination of the down-curving effect and good reflection at the water surface can make audibility over a lake or pond excellent.
If a wave meets an object, various kinds of interaction may occur. If the object is very small compared with the wavelength of sound, the wave is hardly affected at all. (The wavelength corresponding to c' is about 1·25 m or 4 feet.) If the object is approximately the same size as the wavelength, the waves tend to move in towards each other again after passing on either side of it, and so, effectively, go round corners; the sound is said to be ‘diffracted’. If the object is much larger, the main effect is that the waves are reflected.
Diffraction or reflection can, under certain special circumstances, lead to problems. Suppose, for example, that sound finds its way to an observer by two routes of different lengths. The extreme example is the ‘specific echo’ heard in tunnels or before mountains, in which the sound is repeated one or more times. But if the path difference is not so great and the sound is a continuous musical tone, the net result depends to a great extent on the amount of ‘slide’ between the two waves. If it happens that when the paths join up a peak of one coincides with a peak of the other (i.e. if the waves are ‘in phase’), they merely add to each other; but if a peak of one lies on a trough of the other (i.e. if the waves are ‘out of phase’), the waves effectively neutralize each other and no sound is heard (see fig.2). The easiest way to demonstrate this effect is to listen to a high-pitched, steady note in a room; sound will be received direct from the source and also by reflection from the walls and the relative path lengths will depend on position, so that the sound heard can be made to rise and fall in loudness by moving the head. Phase is important, and one can, for example, make or mar the effect of a stereo system by feeding the loudspeakers in or out of phase. It is essential that compressions received by both microphones are reproduced as compressions by both loudspeakers. If this is not so, the resulting sound is diffuse and difficult to locate in space, because the ears rely on phase differences to help in localizing sound.
The addition or diminution effect of two waves with a phase difference is called ‘interference’. Perhaps the most striking demonstration is that which can be performed with a tuning-fork. If a fork is struck and held about 5–8 cm from one ear, the sound will be found to rise and fall in loudness as the fork is rotated. The following explanation refers to fig.3, which represents a view looking down on to the end of the fork. When the prongs move together a compression moves out along directions A and B but in directions C and D the result is a rarefaction. When the prongs move apart again compressions move out along C and D and rarefactions along A and B. Thus the waves in directions A and B are exactly out of phase with those along C and D, as is shown by the quadrants of circles. If one listens in directions W, X, Y or Z one receives simultaneously two waves exactly out of phase with each other; they effectively neutralize one another, and practically no sound is heard.
‘Diffusion’ is a term sometimes used in discussing the distribution of sound waves in a hall, implying a mixture of reflection and diffraction from specially shaped panels or reflectors so placed that sound waves that would otherwise be ‘wasted’ can be deviated into more useful directions. All the processes discussed above – refraction, reflection, diffraction and interference – affect the direction, distribution and loudness of sounds but have relatively little effect on their quality; the shapes of the waves remain unchanged.
In any serious research it is important to be able to describe the object of study precisely, but in the case of sound this is exceedingly difficult. It is possible to describe sounds in words, in pictures or by association with colours, but none of these representations can be called precise. Musicians have traditionally used a symbolic notation that is satisfactorily specific as far as the pitch and duration of each required sound is concerned but is not nearly good enough for scientific purposes, especially when the quality of sounds is involved. On a musical score quality is determined almost exclusively by giving the name of an instrument; but there are almost as many qualities associated with a particular category of instrument as there are instruments, and it is rare to find a composer specifying even in the most general way the kind of violin, clarinet, bassoon etc. called for. Furthermore, interpretation of a score depends on precise knowledge of the instruments. It is therefore necessary to look for much more exact visual representations.
What is required is a means of portraying the exact pressure at a point in the sound wave at every instant of time. One of the earliest ways of doing this was very direct; it consisted simply of letting the sound fall on a thin diaphragm or membrane in the side of a gas pipe feeding a flame. If the pressure on the membrane increased a little the flame jumped and if it decreased the flame sank. The flame was then viewed by reflection in a set of mirrors arranged on the faces of a rotating block of hexagonal or octagonal section. The effect was to spread the images of the flame out horizontally and the variations in height could be seen. Many elaborations and variations of this device have been used during the last century or so, and the device in current use is merely a sophisticated version. The membrane is replaced by a microphone that converts the pressure variations into variations in an electric current instead of into variations of gas pressure. This varying current is then fed to a cathode-ray oscilloscope to give a graph of pressure against time. Variation of the speed of the trace makes possible the examination of the pressure variations in different degrees of detail.
Fig.4 shows the wave trace of a series of staccato notes (a') on a treble recorder at the rate of six notes per second. In fig.4a the trace lasts two seconds and 12 separate notes can be seen. In fig.4b the trace lasts a third of a second and two notes can be seen. In fig.4c the trace lasts 0·1 seconds and shows the beginning and middle of one note. In fig.4d the trace lasts 0·014 seconds and the regular waveform in the middle of the note can be seen. Many important points are illustrated by these traces, and they will be referred to again.
One of the most difficult problems in scientific study is to devise methods of measuring quantities to which the human senses respond in such a way that the measurements bear some relationship to the subjective response. In sound the first difficulty is the enormous range of pressure variations to which the ear is sensitive. The smallest disturbance of the air that can be detected as sound by the average person involves atmospheric pressure differences of about two parts in ten thousand million; the largest disturbance that can be tolerated without the sensation of sound turning into pain is about a million times larger. A range of a million to one in pressure variation is far beyond the scope of any single physical instrument. The range of audible frequencies is not quite so great – about a thousand to one. For both pressure change and frequency the relationship between stimulus and sensation is complicated. If a pure tone of about 20 cycles per second, or 20 Hertz (Hz), which can just be heard as a very low note by most people, is slowly increased in frequency, the perceived sensation of pitch gradually rises, and there is a sensation of ‘coming to rest’ periodically at certain points during the process. These points are, musically speaking, an octave apart in pitch and turn out always to correspond to an exact doubling of the frequency, at least over the middle range (see below for some complications). If two notes are played together it is easy to adjust them by ear so that one is exactly double the frequency of the other; if the ratio is not quite 2:1 the result is harsh and unpleasant (the phenomenon of the ‘stretched octave’, however, is discussed under Psychology of music, §II, 1(iii)). This logarithmic relationship of doubling the stimulus to give equal increments of sensation is quite common in relating subjective and objective measurements; something like it is found in relating pressure changes with loudness.
It is customary to work not in terms of pressure changes but in terms of the energy associated with a wave. The physical quantity most often used is the sound intensity, and it is measured as the energy flow per second through one square metre in units of watts per square metre. The intensity of a sound is related to the square of the pressure difference involved, and so the range of intensity to which the ear is sensitive is a million million to one. The quietest sound that can be heard has an intensity of one million-millionth of a watt per square metre and the ‘threshold of pain’ is one watt per square metre. Again the law relating stimulus and sensation is roughly logarithmic, and doublings of the intensity give something like equal increments of loudness, though again there are complications (see below). These logarithmic laws are aspects of the Weber–Fechner Law, whose most important result is that to produce a noticeable increase in sensation the extra stimulus required depends on the stimulus already present. The idea is, of course, familiar: in conditions of absolute silence one can hear a pin drop, whereas in a noisy machine shop a hammer might fall unheard.
It is not possible to disentangle intensity and loudness from frequency entirely; the ear’s response to sounds of different intensities depends to a considerable extent on their frequencies. Fig.5 shows a set of graphs that are usually called equal loudness curves. They are produced by asking a wide range of subjects to match in loudness pairs of pure tones of differing pitch. Any one of the curves on the diagram represents the actual intensity that has to be produced as a physical quantity in the sound wave to give the same sensation of loudness to the ear. It is quite clear, for example, that for quiet sounds (the lower curves) it requires a great deal more intensity at low and at high frequencies to produce a given loudness than it does in the middle around 1000 Hz (approximately b''). At higher sound levels the curves are much flatter. This is why uniform amplification in reproducing apparatus is satisfactory when the volume of reproduction is high, but at lower levels bass and treble boost is needed. A special ‘loudness’ control is incorporated in some amplifiers to make this correction automatically.
The curves in fig.5 are labelled in decibels (dB) and phons. The decibel is a measure of level, either of sound energy or of power in an electric circuit, and it relates to the ratio of two quantities. It arises from the logarithmic relationship already discussed and is an attempt to provide a unit which, though based on physical measurement, bears some relationship to perceived sensation. If the ratio of two physically measured sound intensities is I1:I2, then I1 has a level n decibels above I2 if n = 10 log10 (I1/I2). Thus since log10 2 is 0·3010, if the ratio I1:I2 is 2:1, I1 is approximately 3 decibels louder than I2. Decibel levels can, of course, be added, so for example a sound that starts at a level of 10 dB above some fixed standard and is then amplified by a factor of two will finish up 13 dB above the standard. In measuring sound or noise levels it is usual to take the minimum sound that can be just heard – the threshold of audibility already mentioned – as the standard (usually defined as one million-millionth of a watt per square metre). It will be obvious from fig.5 that the frequency of the sound will have an influence, and indeed the threshold is not the same at all frequencies. By convention sound levels are measured by comparing them with a 1000 Hz pure tone. If the sound being measured seems to be as loud as a standard 1000 Hz tone when they are heard in alternate bursts, and if the 1000 Hz tone has an intensity level of n dB, the sound being measured is described as having an equivalent loudness of n phons. Thus the curves of fig.5 show the intensity level at different frequencies required to give a constant equivalent loudness; the dB level at 1000 Hz can be seen to equal the equivalent loudness in phons for each curve.
Difficulties begin when, instead of relating all measurements to intensity as a physical quantity, one tries to produce an entirely subjective scale (all the measurements so far described, though they involve subjective matching, always end up with intensity being measured on a meter). One might, for example, assume that, if a sound A when heard by only one ear seems to match in loudness a sound B when heard by both ears, then B is half as loud as A. Or one might try to estimate subjectively when one sound is twice as loud as another. Using this sort of strategy yet another quantity has been introduced, the sone. It is a truly subjective unit, and the complexities of trying to relate, for example, the loudness in sones produced when ten violins play together if separately each one has an equivalent loudness of 60 phons are beyond the scope of this article. Fig.6, however, shows the approximate relationship between equivalent loudness of a sound in phons and its loudness in sones. One sone is arbitrarily defined as 40 phons and, roughly, an increase of nine phons is needed to give an increase of one sone.
To return to the impossibility of disentangling frequency and intensity, it is often stated that there is a direct relationship between frequency and pitch, and that the physically measurable frequency completely defines the sensation of pitch. The subjective sensation of pitch can, however, under certain circumstances, depend on the intensity as well as the frequency. Fortunately the effect is strong only when pure tones are involved; real instruments produce much less striking changes. There seems to be confusion over the exact nature of the effect. Some have given quite specific relationships, but Taylor’s experiments with a wide range of audiences produce variable results. If a pure tone of absolutely constant frequency is suddenly increased in intensity then, whatever its frequency, some listeners think that it has risen in pitch, some that it has stayed the same, and others that it has gone down.
The problems of relating pitch to frequency, however, are of greater importance. It is convenient to introduce a system of dividing the octave that takes note of the logarithmic aspect of sensation, and then the various intervals judged subjectively can be translated into this physically measurable quantity – a division analogous to the decibel for loudness measurements; the one in common use is the cent, a 100th part of an equal-tempered semitone. As was the decibel, the cent is a logarithmic measure of ratio, and intervals in cents may be added together. If the interval ratio between two notes is I1:I2 then their interval is n cents if n log10 2 = 1200 log10 (I1/I2). Thus if the interval is one octave, I1/I2 is 2 and n is 1200. A perfect 5th has the interval ratio 3:2 and a perfect 4th 4:3, so together they give an octave since 3/2 x 4/3 = 2/1. Expressed in cents the 5th is 702 cents and the 4th 498 cents, and the sum of these is 1200, an octave. (For further remarks on scales and intervals see §9 below.) The cent, then, relates directly to physical measurement of frequency, but it is important to recognize that the system depends on tuning experiments in which two notes are listened to simultaneously.
As was seen in §2 above, if two pure tones of identical frequency and intensity are added together the net result will depend on their phase difference. If the two waves are just a little different in frequency then, even if the source-to-ear distance remains fixed, the waves are alternately in and out of phase along their lengths, and the loudness rises and falls to give the familiar ‘beat’ phenomenon (fig.7a). The elimination of beats provides a precise method of adjusting two notes to identical frequency. If the notes are exactly an octave apart, they will add to give a steady waveform and the resulting impression is smooth and steady (fig.7b); if they are not quite an octave apart again they will ‘change step’ and the change can be detected by the ear though it is not as marked as the beat effect (fig.7c). If, however, two notes are played successively rather than simultaneously and observers are asked to judge when the pitch of one note is twice or half the pitch of the other, estimates of intervals are considerably different. A scale of pitch based on this melodic judgment is measured in mels. Fig.8 shows the relationship between frequency measured in Hz and corresponding pitch measured in mels. The pitch of a 1000 Hz note is defined as 1000 mels. (For further information on psychoacoustics see Hearing and psychoacoustics and Psychology of music.)
Since tones of specific frequency have a repetitive waveform, the most obvious way to generate them is from some system that is rotating, so that the same sequence of events occurs in every revolution. Most of the hums and whines associated with machinery arise from this, and it is a familiar fact that as the rotational frequency rises so the pitch of the tone goes up. The only device constructed with the deliberate intention of deriving a tone mechanically from a rotating object is the siren, which in its simplest form is merely a disc with a ring of equally spaced holes near its outer periphery. The wheel is so arranged that a jet of air from a pipe is alternately interrupted and allowed to proceed through one of the holes as the disc is rotated. If the speed of rotation is high enough, a succession of puffs of air at a rate audible as a musical tone can be produced. Such a wheel may be provided with several rings with different numbers of holes in each. If the jet of air is directed at different rows then, even though the rotational speed of the disc remains constant, a sequence of notes can be produced and simple tunes played. For example, if eight rings of 24, 27, 30, 32, 36, 40, 45 and 48 holes are used, a diatonic major scale results. The pitch ratio of a tone to any other tone stays constant at any given rotational speed (i.e. the siren will always produce a diatonic major scale) but the absolute pitch depends on the speed of rotation.
Such devices have only rarely been used as musical instruments, but they do give useful frequency standards, as it is relatively easy both to produce and to measure steady rotational speeds. There are, however, several devices that use rotating systems as the basis of their tone-generators but make sound by electrical means, for example the Hammond organ and the Compton electrone.
Almost anything can be made to vibrate, but the frequency may be outside the audio range, or it may be so heavily damped that the vibration does not persist long enough for it to be heard. It is impossible to separate the idea of vibration from the idea of waves, and the time taken for a wave to travel from one point to another is all-important in discussing vibrations. Consider, for example, an open tube of about 2 cm internal diameter and 37·5 cm in length. If a puff of air is sent in from one end, it will travel along until it reaches the other; there it will suddenly find itself free to expand into the open air and the resultant pressure difference will cause more air from inside the pipe to move out of the end. The result is that an expansion or negative pulse – a momentary lowering of the pressure – travels as a wave back to the front end. As soon as it arrives back at the beginning, air from the outside is pushed in to fill up the low pressure region, will over-shoot the mark and another compression will travel outward along the tube as did the first. The total time taken to travel from one end to the other and back again is the distance (75 cm) divided by the velocity (say 330 m per second) and hence the number of double trips in a second is 440, so the tube will produce the note a'. If the palm of the hand is used to strike one open end, a ‘pop’ at this pitch can clearly be heard; and if a tuning-fork producing 440 Hz is held near the open end, the pulses produced by the fork are exactly in time with the pulses travelling up and down the tube, and so the phenomenon of resonance occurs: the fork appears to produce a much louder note.
If a vibrating system is to be used as a musical instrument, it must be possible to change its pitch, and therefore to change the time it takes for a pulse to travel through one cycle. This can be done either by changing the dimensions of the object or by changing the velocity of the pulse. To begin with the former, if the air tube had been only 18·75 cm long, the pulse would make the double journey in half the time; if it had been 75 cm long it would take twice as long. The resultant notes would thus be a'' (880 Hz) and a (220 Hz) respectively. Thus by far the simplest way of making a musical instrument is to take a collection of vibrators of different sizes and use one for each required note. The piano, organ, harp, xylophone etc. all follow this principle. The next simplest way is to use one vibrator but to change its length or the velocity of the pulse each time a new note is required. In string and woodwind instruments the effective length of the vibrator is changed, and in the strings the tension also can be changed; the tension alters the velocity of the wave along the string, and hence the pitch of the note.
A difficulty that sometimes arises is that of relating compression waves travelling up and down hollow pipes with transverse waves travelling along a string. The simplest way out of the difficulty is always to think of ‘disturbances’ travelling up and down. A disturbance may be an increase of pressure in the air in a pipe, a decrease of pressure in the air in a pipe, a sideways movement of a stretched string, a longitudinal movement of the coils of a spring, an increase or decrease in voltage or current in an electrical circuit, and so on. It is customary to draw graphs of these disturbances showing time along the direction of travel and the magnitude of the disturbance vertically. Thus fig.1d might represent any of these kinds of wave, with the vertical coordinate representing pressure, voltage, lateral displacement etc. as appropriate. The scientific quantity ‘amplitude’ is simply the amount of the disturbance from the undisturbed state.
Before considering the third common method of pitch changing it is necessary to note a complication in the simple picture of pulses travelling up and down a pipe. If a tuning-fork at a'' (880 Hz) is held to the 37·5 cm pipe, resonance still occurs, because although twice as many wave crests are being sent into the tube, they travel at the same velocity as before, and the first arrives back as the third one goes in. Resonance will also occur at roughly all integral multiples of the basic frequency. These frequencies are usually called ‘harmonics’ of the basic frequency. For many of the long thin vibrators used in real musical instruments (pipes, strings etc.) the sequence of frequencies at which vibrations will easily occur has this harmonic relationship. In more complex shapes – pipes of non-uniform bore, plates, cups, bottles etc. – the times taken for pulses to travel in different directions and to return are not so simply related, that is, the ‘modes of vibration’ are not necessarily harmonic. The third basic way of altering the pitch of an instrument involves changing the mode of vibration. The brass family provides the obvious examples and, to a first approximation, the notes produced by a simple brass instrument without valves (e.g. a bugle) have the harmonic frequency relationships 1 : 2 : 3 : 4 etc. But there are considerable complications (see §6 below).
Vibrational modes can be demonstrated on the piano. If a single note is struck and released while the key corresponding to the octave higher is held down to release the damper, the octave string will be heard resonating strongly: clearly it must be responding to the second harmonic of the original note. Similarly, if the key corresponding to a 12th higher is held down, the third harmonic will be heard, and so on. A second demonstration involves a brass plate, firmly clamped on a pillar at its midpoint and bowed on its edge with the finger placed in various ways round the edge. A large number of modes – each with a precise and characteristic frequency, though not harmonically related to the lowest – can be produced and the pattern of vibration can be revealed by scattering sand on the plate. The sand moves away from the more violently vibrating areas and patterns result (fig.9). In general, the higher the frequency the more complicated and detailed is the pattern. This experiment was originally performed by Chladni in about 1790.
The term ‘mode’ simply refers to a particular pattern in which an object may vibrate. One can refer to vibration in a single mode or to vibration in several modes simultaneously. The term ‘harmonic’ is strictly a mathematical one and should be kept solely to describe modes having frequencies that are exact multiples of some fundamental frequency (the first harmonic), and the number of the harmonic is always the number of the multiple, even if all the harmonics are not present. Again one may speak of a single harmonic or of a complex mixture. The term ‘overtone’ always refers to modes of frequency higher than that of the fundamental; they may be harmonic but are not necessarily so, and they are numbered in sequence as they occur with the one next above the fundamental as the first. ‘Partial’ is almost synonymous with overtone in that it refers to a component of a mixture that may or may not be harmonic but its numbering starts from the fundamental; the fundamental is the first of the partial vibrations but is not an overtone. To illustrate the nomenclature, consider a more-or-less cylindrical pipe closed at one end that can be excited in some way to give a sequence of modes, either separately or simultaneously, that have frequencies 220, 660, 1090 and 1540 Hz. The mode of frequency 220 Hz is the fundamental, the first harmonic and the first partial; the mode of frequency 660 Hz is the first overtone, the second partial but the third harmonic. The mode of frequency 1090 is the second overtone and the third partial – but is not a harmonic (1100 would have been the 5th harmonic if present) unless one sees the whole series in terms of an absent fundamental of 10 Hz.
Two electrical methods were mentioned above with rotation; this section concerns methods in which the actual timing is electrical in origin. Two categories will be considered: the first involves electronic processes somewhat analogous to the mechanical oscillations in traditional instruments; the second involves the entirely artificial process of creating waveforms of the required shape by digital computer.
The howl produced when the volume control on a public address system has been turned up too high is produced by oscillations in the electric current that depend on precisely the same phenomena as the kinds of vibration already discussed. Any small sound picked up by the microphone is amplified and passed to the loudspeaker, from which it emerges only to fall on the microphone again. But there is a delay because of the time taken for the electric current to flow and for the sound itself to travel from loudspeaker to microphone. All these times stay constant, however, and so the sound goes on being passed back and forth in a regular way (closely analogous to movement of a compression wave in a pipe); therefore, since the times are short, a tone or howl is produced. The pitch can be varied by altering the distance between the microphone and loudspeaker, or by altering elements in the electrical circuits to change the time delay there. This is not a practical method, but in essence it is exactly the same as that used in an electronic tone generator; there a portion of the output current is effectively fed straight back into the amplifier input instead of through a microphone and loudspeaker. Modern electronic technology makes it possible to produce oscillating systems that are remarkably small and compact.
There are three principal ways in which such tone generators can be used to produce musically usable sounds, though these are now of little more than historical interest. The least complicated, but rather cumbersome, system is to use generators that will produce pure tones and to mix these in various ways to produce the variation in final waveform; quite a few early electronic organs were built on this principle. The second way is to generate much more complex waveforms, by suitable design of the electronic circuits, and to modify these by means of various filters to produce tonal variations. Again this system has been used in electronic organs. The third way is to use both types of generators and a wide variety of modifying circuits all of which can be interconnected in a flexible way, and this is the basis of early synthesizers.
In order to use a digital computer to produce a required waveform, five distinguishable steps are needed. First, the computer must be programmed to calculate the sequence of pressure changes in the required wave at a large number of points, probably 40,000 every second. The next step involves generating a uniformly regular sequence of electrical pulses at the same intervals of time. The third step is to make the height of each successive electrical pulse correspond to the calculated pressure in the sound wave at that point. (These last two steps are usually performed by a single device known as a ‘digital-to-analogue converter’.) The fourth step is then to pass the sequence of pulses through a filter system that effectively smoothes out the steps between the successive pulses and leaves the required waveform. The fifth and final step is to play the waveform through the usual amplifier and loudspeaker system. This technique does of course, presuppose that the waveform for a given sound is known.
In the 1980s and 90s there was a complete revolution in the development and use of electronic devices in music (see Electo-acoustic music). Among many innovations is the system known as MIDI (Musical Instrument Digital Interface), which permits the control of one instrument by another or of a complete set of instruments by a computer. The technique of sampling involves recording in digital form a fragment of real sound which can then be modified, changed in pitch and mixed in an infinite range of ways; it could be said to be the direct descendant of musique concrète.
Most of the simpler kinds of mechanical vibrators tend to produce a waveform not very different from that of a pure tone. Fig.10a shows the waveform produced by a treble recorder sounding c'' (523 Hz) played rather loudly; fig.10b shows the same note bowed on a violin; and fig.10c the same note on a clarinet. The waveform of fig.10d sounds to be of the same pitch, though it is produced by mixing a group of high-pitched tones, none of which individually is below about 2000 Hz. This last tone is sometimes described as a tonal complex, and is said to produce a ‘residue’ effect, the apparent c'', in the ear (see §9 below). In quality these notes sound quite different, though basically of the same pitch, and the earliest attempts to account for the variations were based on the idea that each was a different mixture of pure tones with harmonically related frequencies. Since many conventional instruments use vibrators which, as already mentioned, have many modes of vibration with frequencies that are harmonically related, it is reasonable to ask whether vibration in several of these modes simultaneously could give rise to the more complex waveform and richer quality of real instruments as opposed to those of simple vibrators. This turns out to be a reasonable hypothesis, provided attention is confined to steady, continuous tones; fig.11 shows a synthetically produced waveform made by adding three electronically produced pure tones: its resemblance in general form to that of an oboe is obvious. It is not surprising, therefore, that the earliest attempts to synthesize sounds were aimed merely at producing the right harmonic mixture; but the sounds made were quite different and distinctively electronic. The reasons for this arise from the fact that in any real musical performance the notes used are not steady and continuous but have to start, stop grow louder, decay and change in all kinds of other ways.
No vibration can start instantly, but the total time taken for it to build up depends on a number of factors. Only two will be considered here: the effect of the method of excitation (plucking, bowing, blowing etc.) and the effect of the size and complexity of the vibrating system.
Plucking a string is one of the more rapid methods of setting up a vibration, which, other things being equal, should only take one or two cycles to be properly established; if the note is a' (440 Hz), this happens within 0·005 seconds. But there are two factors that work against this pattern: damping effects due to the air surrounding the string and to the losses of energy that arise from the bending of the material of the string cause the vibrations to die away; and a string on its own is far too quiet to be of any use as a musical instrument, so it is usually connected to an amplifier of some kind that may be mechanical (soundboard or soundbox) or electrical (pickup, amplifier and loudspeaker). This second effect is part of the size and complexity factor to be discussed below.
Bowing leads to a much slower start, and indeed the whole pattern of vibration of a bowed string is different from that of a plucked string. Bowing falls into the category of ‘stick-slip’ motion and permits energy to be fed in continuously so as to produce a continuous note. For a detailed discussion see Acoustics, §II, 7.
The air in wind instruments of the flute family, which includes many kinds of organ pipes, is set in motion when a jet of air is directed at a sharp edge. Crudely speaking, a series of eddies is formed, as when a stick is drawn through water, and these travel down alternate sides of the edge. If the edge is the mouthpiece of a flute or recorder, or part of an organ pipe, the jet of air can be imagined as waving smoothly back and forth sending alternate eddies up the inside and outside of the pipe. The resultant sequence of pressure waves travelling up the inside may match one of the resonant periods of the pipe and so build up a strong vibration pattern. The reflected waves travelling back down the pipe of course interact with the eddies and so the frequency of eddy production and the natural resonant frequency of the pipe are not independent of each other. When the first few eddies are produced, however, the behaviour of the waves in the pipe may be quite erratic and so the starting transient can be very complicated.
In reed instruments (and brass, where the player’s lips form the reed) the basic initiating mechanism is a sequence of puffs of air produced by the opening and closing of the reed. It takes a number of cycles for the natural frequency of the pipe to react back on the behaviour of the reed and to arrive at a steady state, and so reed instruments generally have a rather erratic starting transient that gives the sound a characteristic feature. Fig.12 shows the initial waveforms of a bowed string, a plucked string, a flute and a reed-driven pipe.
Few primary vibrators are loud enough on their own to be used as musical instruments, so amplification is usually needed. For example, the string of a violin without the body can hardly be heard, and the vibrations in the pipe of a brass instrument are muffled without the horn at the end. But if instruments become ‘coupled systems’, then odd effects occur during the starting period. The reed and pipe are examples of this. What happens, in general terms, is that one of the two parts of the system starts to vibrate and passes some of its energy to the other; then there may be a ‘difference of opinion’ as to the frequency at which vibration should take place, and it may be many cycles before the vibration is stably established. The period of ‘argument’ is the starting transient.
The aural effect of the starting transient – whether it is caused by the method of initiation or by the coupling of two systems or, as is usual, by both processes – is pronounced. It can best be demonstrated in a negative manner by listening to a recording of a note from which the first quarter of a second or so has been erased. The whole character of the note is changed. If, therefore, the sound of a particular instrument is to be synthesized it is not sufficient to produce the right steady-state waveform; the right starting transient must be produced as well.
Even after the note has started there are usually further changes: a plucked string may give vibrations that gradually die away, a bowed string may vary in loudness with the pressure and velocity of the bow, a reed instrument may rise and fall slightly in loudness, and a complex mixture of modes of vibration may change the sound’s composition with time. The changes in amplitude of the waves associated with a note are usually called the ‘envelope’. A cathode-ray oscilloscope with its spot moving slowly horizontally compresses the waves so much that the individual vibrations cannot be seen, but the envelope becomes clearly visible. Fig.13 shows the envelope of a harpsichord note (a), a staccato note on a flute (b), and a staccato note on a french horn (c).
In synthesizing sounds electronically the ‘envelope shaper’ is an important element; fig.13d and e show synthetic staccato notes with triangular envelopes. The waveform is the same for both, but the envelope has simply been reversed, and the aural effects are totally different: d sounds vaguely like a plucked or struck instrument; e like some kind of harmonica or harmonium. Trace e could equally well be produced by playing the tape for d in reverse. The well-known trick of recording a piano piece and playing it backwards is a good way of illustrating how important the envelope is: reversing the tape can have no effect on the harmonic content, and yet the tone of the instrument is completely changed.
Envelope shapes play an essential part in human speech. The consonants are usually fairly drastic changes in envelope shape. A plosive, like ‘p’, makes a fairly rapid initiation of random noise (air escaping when the lips are opened) leading on to a vowel, a steady note. If the noise is allowed to rise in amplitude more slowly, the result is an ‘f’. Fig.14 shows the sequence of shapes in the word ‘perfection’.
As has been noted, most instrumental sounds involve some kind of source, usually rather weak, and some means of making it louder. Unfortunately, because it complicates matters – or fortunately, because it adds such richness and variety to instrumental tone – this amplification is never done without also changing the waveform to some extent. It is difficult to indicate with any degree of precision the kind of change that is made to the wave, but if the distribution of harmonics contributing to the wave before and after amplification is examined, it is usually possible to find a characteristic that can be specified. If a graph of the degree of amplification at each frequency is plotted the result is sometimes described as the ‘formant characteristic’ of the amplifier or instrument. For example, increasing the treble gain on an electronic amplifier makes any hiss on the recording louder and increases brilliance; turning up the bass gain emphasizes any turntable rumble and muffles the tone. In each instance a different formant is being imposed. For a given setting of the treble and bass controls the formant characteristic is constant, but the frequencies present in the emerging wave still depend on those present beforehand. The amplifier imposes something of its own character on all sounds passing through it. The concept of the formant characteristic is important in many branches of acoustics.
In musical instruments the basic vibrator produces the initial set of harmonics, but these are modified by the formants of the amplifier (which may be a horn, the body of a string instrument, the side holes in a woodwind instrument etc.). The net sound emerging is then modified by the formant of the room. If the sound is being recorded or transmitted elsewhere, the microphone, transmitting apparatus or recorder all impose further formants, and then the ears and hearing mechanism in the brain have their own formants. (Deafness over some part of the frequency range is surprisingly common.) The result of all this is, of course, that the wave that is finally perceived by the brain may be very different from the one that started out from the basic vibrator.
Formants are important in all instruments, though, strictly speaking, for some they may be difficult to identify as they may change from note to note. Some would argue that the phenomenon is then no longer properly called a formant effect, but one may speak of constant or variable formants to take both types into account. An example of a constant formant with a powerful influence on tone is that of the body of a string instrument; some changes may occur as the player moves from one string to another, or from changes in the tension of the strings reacting on the body, but these are usually small and the main amplifying characteristic of the body remains the same over the range. An example of a variable formant is that of a clarinet, where the formant comes from a complex mixture of effects controlled by the bore variations, the positions of the finger-holes, the number of holes or keys that are depressed and so on. The art of the clarinet maker is to ensure that the formant characteristic does not change too violently as the player moves from one note to another.
One of the most essential aspects of formants for human beings is their part in the control of the voice. The vocal cords produce a basic tone that can be varied, as already described, in envelope, but the tone can also have many different formants imposed on it by the amplification and resonances of all the various cavities of the nose, throat and mouth. Some of these are not variable and impose several of the characteristics that distinguish one speaker from another, male from female, youth from age and so on. Others are variable and allow the vowel sounds to be produced. It is now usually held that there are four fairly sharply defined peaks in the frequency distribution curve of any vowel, and that it is the position of these peaks that determines the vowel; their positions, for a given vowel, are the same whether the voice is high or low in pitch and whether the vowel is being spoken or sung. Fig.15 shows the generally accepted centres of the three main peaks for some common vowels. The middle formant is probably the most important one, as may be demonstrated if one holds the mouth in the shape required for saying ‘Ooh’, whispers loudly, and then changes the shape to ‘Ah’ and back a few times; there is an apparent change in pitch that may be anything from a 5th to an octave depending on the particular quality of vowel sounded. This change corresponds to the big change in position of the middle formant peak.
The elementary acoustics of pipes introduced in §5 above needs some amplification, since the previous treatment relates only to open cylindrical tubes. In real instruments an end might be partly closed in a number of ways; also, a pipe might have a succession of conical bores with different cone angles interspersed with cylindrical sections of different diameters.
To reconsider first the simple picture given earlier of the way in which waves build up to resonance in a pipe, suppose there is some kind of plate, driven like the piston of a steam engine so that it alternately compresses and rarefies the air just outside the end of a pipe in a sinusoidal way (such a device, called a pistonphone, is sometimes used as a source of sound for testing microphones), and suppose that the piston cycle has a frequency n and the pipe which is open at both ends has a length equal to half a wavelength for that frequency. The first compression will travel the half wavelength, be reflected as a rarefaction and arrive back at the initial end, where it would usually create a compression ready to start again. Since it has travelled one wavelength altogether it will be exactly in step with the next compression. Suppose, however, that the length is something other than a half wavelength. The initial wave will then arrive back at some other point of the cycle; the effect will be like pushing a swing at the wrong moment, and the wave will die out. If the length is a quarter wavelength (or if the first pipe is excited at 2n), the first wave will arrive back exactly halfway between two compressions and the rarefaction produced by the plate will completely neutralize the wave in the pipe. However, if the excitation is not sinusoidal but consists of a very brief compression pulse, then at 2n there will be a build up, as there will be at ½n or 1/3n. Sinusoidal excitation excites resonance at only one frequency in a simple system; pulse excitation may excite resonance at a great many multiples and sub-multiples of this frequency.
In a reed instrument such as the clarinet, it can be shown (without going into detail) that the reed, which sets up oscillations in a pipe, is not a linear device. Its behaviour is not symmetrical: relatively small forces in one direction completely close the reed, whereas much larger forces can be applied in the opposite direction and the reed goes on opening. Thus the form of air control exerted by a reed is rather more like a succession of pulses than a sine wave; it is the right sort of excitation to set up resonance in several modes, and so to produce the characteristic tonal complex that would not be possible with pure sinusoidal excitation.
The last matter to be considered is reflection from the end. In woodwind instruments the reflection is not from the end except for the lowest note; it is more likely to be from a side hole, and there will be other, regularly spaced, side holes open beyond this. There may be a bell, and it can easily be shown that this affects the tone colour for only the lowest one or two notes; for the higher notes most of the sound is escaping through the side holes. Finally, in a brass instrument there is a bell that is always operative. The way the wave is reflected is critically dependent on the shape and spacing of the holes and on the shape of any bell. If a high proportion is reflected, good oscillations are set up in the pipe but little sound emerges; if the proportion reflected is low, it may be difficult to set up oscillations, but those that are set up emerge quite strongly. Benade has made a close study of all these phenomena, and has measured the ‘input impedance’ of pipes. In general terms, the higher this is at a given frequency the greater is the tendency for there to be oscillations maintainable at that frequency. Fig.16a shows this property plotted against frequency for a plain cylindrical pipe closed at one end only. The peaks are all at odd multiples of 63 Hz and correspond to the harmonics predicted by simple theory. For fig.16b a trumpet horn has been added to the open end. Two obvious things happen: the sequence of frequencies changes to become quite different from the odd-harmonic sequence; and there are practically no peaks above 1500 Hz. This is because the horn-shaped end ceases to act as a reflector above this frequency, and nearly all the energy leaks out into the air instead of maintaining oscillations within the pipe. The cut-off frequency above which waves are not properly reflected also occurs in woodwind instruments and is related to the spacing and size of the open finger-holes below the one defining the note. It is possible to change the frequency of one particular peak independently of the others by changing the bore diameter at certain critical points. Instrument makers need all the variables of hole position, hole size, bore size etc. in order to produce instruments that play in tune, give the required harmonic mixture and produce components that are in tune and cooperate well. (For further information on wind instrument sounds see Acoustics and articles on individual instruments.)
The measurement and analysis of musical tones is not easy. For a steady, unchanging tone the quantities that are most useful are the predominant frequency associated with it, the intensity or loudness, and the relative amplitudes and frequencies of the other components of the complex. If, however, the note is changing with time, then the way in which all these separate quantities change must also be recorded.
Intensity, or loudness, is usually measured by means of a microphone, amplifier and meter, but careful calibration is necessary and the relative positions of the microphone and source, the surroundings and many other factors affect the result. Sound level meters are available with built-in filters that have a frequency characteristic resembling that of the average human ear, but for an accurate estimation of the loudness of a sound it is necessary to measure the amplitude at a series of frequencies over the audio spectrum.
The ready availability of cheap and powerful computers and microprocessors has led to the almost universal adoption of digital techniques for the analysis of rapidly varying waveforms. An analogue-to-digital converter samples the magnitude of the disturbance at intervals that may be as short as desired (usually around 40,000 per second), yielding a sequence of numbers. Once in digital form, the signal can be processed by a mathematical technique known as Fourier analysis (see §8(ii)) to show how the amplitude of each frequency component changes over the duration of the signal. Information about the frequency content of a signal can be displayed in a number of ways. One is the sonagram, a two-dimensional diagram with frequency on the vertical scale, time on the horizontal scale, and intensity represented either by colour or by a grey scale. Fig.1 shows a sonogram of the first five seconds of a harpsichord note. Each of the vertically equidistant horizontal bars represents one of the almost exactly harmonic frequency components of the harpsichord sound; the different rates of decay of the components can clearly be seen.
The sonagram is generated by dividing the digital sound sample into a series of short time slices, on each of which the Fourier analysis is performed. The frequency spectrum of each slice can be individually displayed if desired, and the information contained in the frequency spectrum can be used to compute the loudness of the sound or the predominant frequency at the chosen time. Fig.17b, c and d show the frequency spectra at the beginning, middle and end of the harpsichord sound sample in Fig.17a.
With a suitably fast processor, the frequency spectrum can be displayed on a screen within a small fraction of a second of the data capture, and the display can be updated several times a second. The system is then described as a real-time analyser. Using a real-time analyser, a performer can see immediately how a change in the method of sound production affects the frequency spectrum of the sound.
Other modern but non-electronic techniques are also used in studies of musical sounds. High-speed cinematography can reveal a great deal of useful information and has played an important part, particularly in understanding the behaviour of reeds and of vibrating strings. The modern optical technique of holography is playing a part in revealing the way in which the body of a violin or other string instrument is vibrating. The patterns produced are something like those of the Chladni plate, but to produce sand figures on violin back plates large vibrators are needed and, though useful, measurements probably do not correspond to the behaviour of the instrument when it is played normally. Holographic techniques show up the vibration patterns even when the notes being played are extremely quiet. Fig.18 shows holographically produced vibration patterns for a violin back plate.
No discussion of the theoretical aspects of sound would be complete without some mention of the ideas of Fourier analysis and synthesis, though it is not easy to discuss these topics in any detail without fairly complicated mathematics. The basic notion, first formulated by Fourier in about the 1820s in relation to his studies of heat flow, is that any periodic variation in a quantity, no matter how complicated, may always be represented as the sum of a number of simple sine waves with frequencies that are multiples of the basic repeat frequency of the wave (the fundamental). The components (or harmonics) have different amplitudes and phase relationships, and there may be an infinite number of them. The basic notion is not difficult to accept; fig.19 shows some examples of summations. The point that is difficult to accept, and indeed for which there is no formal proof though it is clearly true in practice, is that for any given wave there is only one combination of amplitudes and phases. The consequence of this is that it is possible in principle to take any complex periodic wave and to analyse it into a specific set of components, though it is a process that has only really become practicable for complex waves since the introduction of computer analysis. Fig.20 shows the result of summing three components that are the 2nd, 4th and 5th harmonics of the same fundamental. If the signal shown lasts for one second, the three components have frequencies of 6, 12 and 15 Hz respectively. Though no fundamental component is present, the combined wave repeats at intervals corresponding to the fundamental frequency of 3 Hz. This is an important point to which reference will be made in §9 below. The essence of this kind of analysis, however, is that the basic wave is periodic.
But a single note from a piano or harpsichord, for example, has no part that is strictly periodic, since the amplitude after the initial transient section is decaying all the time, and indeed different components, as has been observed, decay at different rates. Fig.20 may help to show how the analysis can be extended to cover this problem. If the diagram was drawn with three components which were the 200th, 201st and 202nd harmonics, and if the same frequency were used for the first component (i.e. 6·00 Hz), the second component would have a frequency of 6·03 Hz and the third 6·06 Hz. It is not difficult to see that the waveform would now repeat with a fundamental of 0·03 Hz. Thus by making the harmonics very close together it is possible to take care of a wave that repeats only after long periods; and making the harmonics infinitesimally close will enable one to deal with a wave that never repeats precisely. So the same technique of analysis can be used for non-periodic waves, provided one takes harmonics that are so close together that they form a continuous sequence. This kind of analysis of transients, using the digital techniques mentioned, is yielding important information about the transient behaviour of real instruments. In this form it is usually termed ‘Fourier transform’ or ‘Fourier integral’ analysis.
It has often been implied that the reason why some sequences or combinations of notes sound pleasant and acceptable whereas others are disturbing or unpleasant is simply that the brain ‘likes’ simple frequency ratios, such as the octave (1:2), the 5th (2:3), the 4th (3:4). The numbers themselves, of course, cannot have any significance, but a study of the combined waveforms produced by adding two pure tones shows that the combination itself changes at a rate that depends on how close the component frequencies are to each other. For the 5th, as an example, with tones of 400 and 600 Hz the combined wave repeats at a frequency of 200 Hz: exactly half the lower tone. For the 400:413 ratio (just under a semitone) the frequency of repeat of the combined wave form is 13 Hz. In other words, the combined wave is far more complicated and goes through a complex sequence of different patterns taking nearly 16 times as long to repeat as does that for the 5th. It may be that the ear and brain find this complicated sequence much more difficult to cope with than the simple rapid alteration that occurs with the 5th.
If the tones are close to each other in frequency the phenomenon of beats can be heard clearly. Tones of 400 and 402 Hz, for example, give a pattern that completes a cycle twice every second. The beat effect is identical with the sequential effect described above for the 5th or the 400:413 ratio, but, because for these the repeat is rather rapid, it is not heard as a beat. Helmholtz suggested that beats cause the unpleasantness of dissonant intervals, and he went on to show that if two tones, themselves a long way apart in frequency, have upper partials that happen to be close enough to give beats, a ‘roughness’ in the sound is still heard. For example 400:600 is the perfect 5th, and the 3rd harmonic of the lower tone and the 2nd harmonic of the upper tone are both 1200 Hz. If the 600 Hz is raised to 605 Hz, the harmonics become 1200 and 1215, and these give rise to beats at 15 Hz that would be quite unpleasant.
In practice it is found that even when two pure tones are added a harsh effect can result, though there are no upper partials to beat with each other. In such cases a great many other tones can be heard as well, especially if the basic tones are loud. The standard experiment demonstrating this is to sound one tone (say, for example, 1320 Hz) steadily, and to sound a second tone (say 880 Hz) and allow its frequency to glide slowly up until it reaches 1320 Hz. A strong ‘difference tone’ that descends in pitch from 440 Hz to zero is clearly heard. The whole range of additional tones are called ‘combination tones’ and, for basic tones of frequencies f and f2, they have frequencies such as f1 + f2, f1 + 2f2, 2f1 + f2, etc., and f1 – f2, f1 – 2f2, 2f1 – f2, etc. It can be shown mathematically that they can arise from non-linearity in any part of the system, and it is now accepted that very loud tones produce non-linear effects in the ear itself. Perhaps consonance is perceived because the number of combination tones is small, whereas for a dissonant interval a vast array of combination tones arises. This can be shown simply by making the calculations for combination tones with the ratios 400:600 and 400:413 (see Table 1). For the perfect 5th they form a series neatly spaced at 200 Hz apart; but for the second pair the collection is a motley one, and more and more unrelated tones arise as the series is developed. However, dissonance still occurs when the notes are sounded quietly, so one must look for other explanations of the additional tones than that of non-linearity.
This is an area of considerable controversy, but one fact makes it obvious that combination tones do not provide the answer to the problem of dissonance. If three tones of 400, 600 and 800 Hz are combined, a difference tone of 200 Hz is heard whether the tones are loud or quiet. If the frequencies are 430, 630 and 830 Hz, the difference tone is still 200 Hz and is heard when the tones are sounded loudly; but if they are sounded quietly a higher tone is heard: about 210 Hz in this example. This clearly is not a difference tone; it is usually called a ‘residue tone’. The fact is unquestioned; but the origin of the tone and its contribution to the consonance–dissonance problem is still a matter of dispute.
As for musical scales, it is enough to note that they contain many possible combinations that blend together in a consonant way. It is not surprising, therefore, that the intervals involved in scales tend to be the rather simple ones and that, at least from the standpoint of physics, there is a close link between the sequence of ratios in a scale and the ratios for consonant intervals. (For further details see Scale and Temperaments.)
If two people were to try to conduct a conversation while suspended by some hypothetical device in a region far removed from all solid objects, they would find difficulty unless they were quite close together. Fortunately people at least normally stand on solid ground when they converse. Immediately the problem is reduced: some of the sound waves strike the ground and are reflected – not so precisely as is light from a mirror, but nevertheless in broadly the same way – and so the hearer receives two sets of waves, direct and reflected. Provided the total distances travelled by each are not too different this leads to a louder sound. If the difference in distance is great the brain recognizes the time difference and the result is an echo.
If a single wall is added behind the speaker, some waves will still travel direct to the hearer, some will be reflected from the floor, some from the wall, and some first from one and then the other; the result is four times as much energy in the direction of the hearer. This process goes on as surfaces are added. If the reflection is good, as it is when the walls are smooth and hard, the result may be quite intolerable because any sound created is reflected round and round from one surface to another and takes a long time to die away; each syllable spoken is blurred by those immediately before, and all intelligibility is lost. Some swimming baths in which there are large glass and tile surfaces, as well as the water surface itself, all acting as good reflectors, demonstrate well this effect of ‘reverberation’. It is usually measured in terms of the ‘reverberation time’, roughly the time taken for a loud sound to become inaudible.
The first essential scientific problem in acoustic design is thus to achieve a compromise between the need to introduce reflecting surfaces to strengthen the sound produced and the need to keep reflection within bounds to maintain intelligibility. The way in which this can be done is discussed under Acoustics, §I. Scientifically the question is not difficult: the problem is to agree on the characteristics that one is trying to achieve, and also to design a hall that will perform many different functions, each of whose acoustic requirements may be quite different.
Just as the body of a violin amplifies non-uniformly and so ‘colours’ the sound produced by the string as well as merely making it louder, so the resonances in a room can colour musical tones. It has been shown how the Chladni plate demonstrates modes of vibration for two-dimensional devices, and that as the frequency of the mode goes higher so the size of the regions between the nodal lines becomes smaller and the number of nodes increases. The same kind of thing happens in three-dimensional boxes, and nodal surfaces exist. As the frequency goes up, so the spacings between these surfaces shrink. Thus even a large room may break up into a large number of regions and hence provide resonances at frequencies well within the audio range. A classic example of this can be heard by listening to a high note while moving the head sideways rather slowly. The nodal surfaces are close together and the loudness goes up and down quite rapidly as one moves through them. Thus the frequencies at which resonant modes are present will be amplified and a formant effect arises. The pleasure of singing in the bath is largely caused by the fact that the room is small and has hard surfaces, and hence has a number of well-separated resonances in the audio region; quite a modest singer can produce a fine ringing tone to his own satisfaction as a result of modification by the formant. Clearly this factor is of great importance in studios from which recordings or radio transmissions are produced. The placing of the microphones and performers in relation to the walls and other surfaces changes the particular modes excited and provides ways in which the sound engineer can vary the coloration to achieve a desired effect.
Various techniques have been developed for artificially changing the acoustic environment in a room. These are described under Acoustics, but the essence of them all is to modify the way in which the reflections occur (decreasing them by covering surfaces with absorbent material or increasing them by providing microphones and loudspeakers, and introducing artificial time delays to simulate the acoustic path differences), or by modifying the formant characteristics. The latter method involves artificially amplifying certain frequencies corresponding either to specific modes that are not being stimulated or to modes of desirable frequencies that do not occur because of the particular disposition of the elements of the hall. All these techniques are fraught with difficulties, mainly because it is not easy to avoid the feed-back howl previously described, but also because again it is hard to decide on the required features. A formant characteristic and reverberation time that suits a solo performer may not necessarily suit the audience and vice versa. However, some fascinating results have been achieved.
The question often arises whether it will ever be possible to synthesize precisely the tone of a given instrument, or even of a complete orchestra. The answer is that it is possible now; given the necessary time and a large enough computer one can match exactly the required waveform of any instrument or combination of instruments. But it can take a long time, even with the biggest computers, to produce even a few seconds of complicated music, and so in practice such an operation is of limited use. The relative success of synthesizers as opposed to computers is because of the speed at which they can operate.
The biggest problem in the production of synthetic sounds is principally that of devising methods of control, and methods of scoring that can permit the techniques to be used with the same flexibility as traditional instruments; the use of MIDI and other techniques led to enormous advances in the 1980s and 90s. Computers and synthesizers provide a great deal of information about the important features of the waves produced by traditional instruments, and a fruitful collaboration between instrument makers and scientists is possible. Physics is beginning to produce much more realistic explanations of the behaviour of real instruments, and in many cases these give the instrument maker ways of predicting with much greater precision the modifications needed to improve tone quality.
Developments in material science may possibly have something to offer the instrument maker. Materials such as cane for reeds, the various woods used for the bodies of string instruments etc. are not susceptible to control. The range of naturally available material must be scanned and selections made on the basis of experience. If it becomes possible to manufacture materials with the desired properties, predictable in advance and liable to much less change with time than natural materials, this would be a great boon. It seems that costs might be prohibitive for all but the simplest mass-produced instruments, but there may well be rapid advances in the near future.
There remain many problems in understanding the mechanism of hearing, the origins of consonance and dissonance, the precise way in which the ear and brain respond to transients, and the phenomena of aural illusions. In the last category the rapid developments in stereophony, quadraphony and the creation of complete sound environments are uncovering almost as many fascinating problems as they solve, and psychoacoustics is again an area where many new insights are appearing.
H.L.F. Helmholtz: Die Lehre von den Tonempfindungen (Brunswick, 1863, 4/1877; Eng. trans., 1875, 2/1885/R, 6/1948, as On the Sensations of Tone)
G.A. Audsley: The Art of Organ-Building (New York, 1905/R)
W.C. Sabine: Collected Papers on Acoustics (Cambridge, MA, 1922/R)
W.H. Barnes; The Contemporary American Organ (New York, 1930, 9/1971)
D.C. Miller: Anecdotal History of the Science of Sound to the Beginning of the 20th Century (New York, 1935)
J. Jeans: Science and Music (Cambridge, 1937/R)
C.E. Seashore: Psychology of Music (New York, 1938/R)
A. Wood: The Physics of Music (London, 1944, 6/1962, rev. 7/1975 by J.M. Bowsher)
J.M. Barbour: Tuning and Temperament: a Historical Survey (East Lansing, MI, 1951/R, 2/1953)
H.F. Olson: Musical Engineering (New York, 1952, rev. 2/1967 as Music, Physics and Engineering)
R.W. Young: Table Relating Frequency to Cents (Elhart, IN, 1952/R, rev. 1976 as Making Sense out of Cents)
J.B. Keller: ‘Bowing of Violin Strings’, Communications on Pure and Applied Mathematics, vi (1953), 483–95
J.F. Corso: ‘Absolute Judgements of Musical Tonality’, JASA, xxix (1957), 138–44
A.H. Benade: ‘On Woodwind Instrument Bores’, JASA, xxxi (1959), 137–46
A. Benade: Horns, Strings, and Harmony (New York, 1960/R)
W.A. Van Bergeijk, J.R. Pierce and E.E. David: Waves and the Ear (New York, 1960)
F. Winckel: Phänomene des musikalischen Hörens (Tutzing, 1960; Eng. trans., 1967, as Music, Sound and Sensation)
J. Backus: ‘Vibrations of the Reed and Air Columns in the Clarinet’, JASA, xxxiii (1961), 806–9
C.M. Hutchins: ‘The Physics of Violins’, Scientific American, ccvii/5 (1962), 78–84, 87–93
D.W. Martin: ‘Musical Scales since Pythagoras’, Sound, i/3 (1962), 22
M.V. Mathews: ‘The Digital Computer as a Musical Instrument’, Science, cxlii (1963), 553–7
W.D. Ward: ‘Absolute Pitch’, Sound, ii (1963), no.3, p.14; no.4, p.33
P. Lehman: ‘Harmonic Structure of the Tone of the Bassoon’, JASA, xxxvi (1964), 1649–53
P. Plomp: ‘The Ear as a Frequency Analyzer’, JASA, xxxvi (1964), 1628–36
E.D. Blackham: ‘The Physics of the Piano’, Scientific American, ccxiii/6 (1965), 88–99
R. Plomp and W.J.M. Levelt: ‘Tonal Consonance and Critical Bandwidth’, JASA, xxxviii (1965), 548–60
C.A. Taylor: The Physics of Musical Sounds (London, 1965)
J.R. Pierce, M.V. Mathews and J.C. Risset: ‘Further Experiments on the Use of a Computer in Connection with Music’, Gravesaner Blätter, nos. 27–8 (1966), 92–7
M.D. Freedman: ‘Analysis of Musical Instrument Tones’, JASA, xli (1967), 793–806
C.M. Hutchins: ‘Founding a Family of Fiddles’, Physics Today, xx (1967), 23
J.J. Josephs: The Physics of Musical Sound (Princeton, NJ, 1967)
J.C. Schelling: ‘Physical Effects of Violin Varnish’, Catgut Acoustical Society Newsletters (1967), nos.6–8
C.M. Hutchins and F.M. Fielding: ‘Acoustical Measurement of Violins’, Physics Today, xxi (1968), 34
J. Backus: The Acoustical Foundations of Music (New York, 1969, 2/1977)
W. Reinicke and L. Kremer: ‘Application of Holographic Interferometry to Vibrations of Bodies of Stringed Instruments’, JASA, xlvii (1970), 988–92
W.D. Ward: ‘Musical Perception’, Foundations of Modern Auditory Theory, ed. J.V. Tobias, i (New York, 1970)
Music and Technology: Stockholm 1970
J. Backus and T.C. Hundley: ‘Harmonic Generation in the Trumpet’, JASA, xlix (1971), 509–19
N. Geschwind: ‘Language and the Brain’, Scientific American, ccxxvi/4 (1972), 76–8
A.H. Benade: ‘The Physics of Brasses’, Scientific American, ccxxix/1 (1973), 24–35
C.M. Hutchins: ‘Instrumentation and Methods for Violin Testing’, Journal of the Audio Engineering Society, xxi (1973), 563
J.G. Roederer: Introduction to the Physics and Psychophysics of Music (London and New York, 1973, 3/1995)
N.H. Fletcher: ‘Some Acoustical Principles of Flute Technique’, The Instrumentalist, xxviii/7 (1973–4), 57–61
A.H. Benade: Fundamentals of Musical Acoustics (New York, 1976, 2/1990)
C.M. Hutchins, ed.: Musical Acoustics (New York, 1975–6)
R. Plomp: Aspects of Tone Sensation (London, 1976)
C.A. Taylor: Sounds of Music (London, 1976)
C.A. Taylor: ‘Musical Acoustics’, Contemporary Physics, xx (1979), 515–34
L.E. Kinsler and others: Fundamentals of Acoustics (New York, 3/1982)
J.R. Pierce: The Science of Musical Sounds (New York, 1983)
L. Cremer: The Physics of the Violin (Cambridge, MA, 1984)
C.A. Taylor: ‘A Scientist in the World of Music’, Science and Public Affairs, iii (1988), 73–88
I. Johnston: Measured Tones (Bristol, 1989)
C.A. Taylor: ‘Physics and Music Twenty Years On’, Proceedings of the Royal Institution of Great Britain, lxi (1989)
C.A. Taylor: Exploring Music (Bristol, 1992)