Update: 2016-09-16 04:33 AM -0400

TIL

The Human Voice

snd-hear.htm
formerly: hv6.htm

by U Kyaw Tun (UKT) (M.S., I.P.S.T., USA) and staff of Tun Institute of Learning (TIL) . Not for sale. No copyright. Free for everyone. Prepared for students and staff of TIL Research Station, Yangon, MYANMAR :  http://www.tuninst.net , www.romabama.blogspot.com

index.htm | |Top
HV-indx.htm

Contents of this page

01. Sensitivity of the Human Ear
02. Pitch - what your ear hears
03. Loudness
04. Timbre (sound quality)
05. Spectrum diagrams
06. Source-filter model of speech production
07. Harmonics
08. Resonance
09. Resonance in a half-open tube
10. Same vowel at different pitches
  vowel-a  vowel-i  vowel-u
11. Formants
12. Canadian-English vowels
13. Phonation types
14. IPA vowel diagram and cardinal vowels
15. Relating formants to articulation

Passages worthy of note:
No matter how the pitch (?) changes, you can still recognize it as the same vowel /i/ or { i }.

 

UKT notes

Body-size effects on voice acoustics : Pitch and F0 need not be the same
British-English vowels
Lax and Tense vowels
quantal vowel
Source-Filter Theory
timbre

Temporary index of figures (in order of insertion):
EGG electrodes -- Fig.5.17
Laryngograph processor -- Fig.5.18
Fig 15 Audio and EGG signal  -- Fig.5.19
German and Russian monophthongal vowels -- Fig.5.20
F2/F1 vowel space -- Fig. 5.21
Bangla vowels -- Fig.5.22

Contents of this page

1. Sensitivity of the Human Ear

We have heard, "In the Country of the Blind, One-eyed Man is the King." However, in the 1904 H. G. Wells' short story by that title, it is not so. Now what about "In the Country of the Deaf"? Of course, "Sound" would be just a form of energy that is transmitted through a medium such as air, and we can "feel" it vibrating our skin. However, we will not let our imagination gets out of hand, and come back to reality of how sound in the form of language is produced and received. We have already dealt with how sound forming a language is produced in earlier chapters. But we must not forget that unless we can hear the consonants, vowels, and rimes produced by the speaker, we as hearers would not be able to make any sense out of them. So, many of the terms we have covered such as "consonants (C)", "vowels (V)", "rimes (V)", "syllables (CV)", "phonation",  etc. must be reconsidered taking our hearing and how we interpret what we heard into our consideration.

UKT: What we hear are the "syllables (CV)", not individual parts: not "consonants (C)", "vowels (V)", or, "rimes (V)" individually. This is the basis of the abugida or akshara system of writing (script), where each grapheme corresponding a phoneme has an inherent vowel which sometimes must be killed with an {a.tht}. We must remember that the akshara system is entirely different from the alphabetic system which is used for writing English-Latin (Eng-Latin). The following presentation is written from the point of view of the alphabetic system, which must be re-interpreted from the akshara point of view. Remember in the land of Myanmar, we are writing Burmese-Myanmar (Bur-Myan) and Pali-Myanmar (Pal-Myan) which are akshara systems. To help us with our understanding, we also refer to Sanskrit-Devanagari (Skt-Devan) which also is an akshara system. - UKT101106

From: HyperPhysics: http://hyperphysics.phy-astr.gsu.edu/hbase/sound/earsens.html#c1 080330

The human ear can respond to minute pressure variations in the air if they are in the audible frequency range, roughly 20 Hz - 20 kHz.

To help us with our Physics, when we are thinking about Sound (pitch), try to interpret it from the point of view of Light (colour), because both Sound and Light are studied as waves.

Do not forget that the sine wave shown is a graphical representation in terms of pressure variations, even though the actual wave itself is a transverse wave. Remember also that the actual sound waves are more complex. Now, listen again to simple sine waves, <)) 300 Hz and <)) 500 Hz . Now listen to a complex wave resulting from addition of these two, <)) 300Hz + 500Hz.

Sound files from: Kevin Russell www.umanitoba.ca/faculties/arts/linguistics/russell/138/sec4/acoust1.htm 080331

A vowel sound is a continuous sound. Listen to a faked vowel { i } / i / sound: <)) 300 Hz and 2000 Hz added

Sounds that we hear may be generally characterized by pitch, loudness, and timbre (aka sound quality).

The physical scientists like me would like to split up the human voice, in the form of sound waves, into more simple waves with quantities which we can measure -- not in terms of pitch, loudness, and timbre. And then, we would be able to describe the sound in terms of measurable quantities instead of talking about POAs and manners of articulation. And, then, we can forget about the phoneticians and their subjects and their L1s (the very first language the person as a new born is exposed to), and their vowel diagrams. However in actual practice the matter is not as simple as you would like it to be.

Contents of this page

2. Pitch

- UKT 101106, 160113: What we hear as "pitch" (frequency) is comparable to what we see as "colour" (frequency). Pitch is sometimes described as F0, yet they are not the same. See body size and voice-acoustics .
Formants, including F0, is a measurable physical quantity. Pitch is perception and is described subjectively and cannot be measured. Generally perceived pitch is proportional to frequency, however it is not always so. For example, the perceived pitch tends to change slightly as the loudness/intensity of the sound increases.

From: http://hyperphysics.phy-astr.gsu.edu/Hbase/hframe.html 080330

What your ear hears is the frequency of the sound. This is what we called pitch. For example, middle C in equal temperament = 261.6 Hz. Listen to a flute playing middle C:  <)). (from Kevin Russell's website).

The perceived pitch of a sound is just the ear's response to frequency, i.e., for most practical purposes the pitch is just the frequency. The pitch perception of the human ear is understood to operate basically by the Place theory, with some sharpening mechanism necessary to explain the remarkably high resolution of human pitch perception.

The place theory and its refinements provide plausible models for the perception of the relative pitch of two tones, but do not explain the phenomenon of perfect pitch. (This phenomenon is found in some people, probably less than 0.01% of the population, who appear to be able to recognize absolute pitches without any reference.).

The just noticeable difference in pitch is conveniently expressed in cents, and the standard figure for the human ear is 5 cents.

Contents of this page

3. Loudness

UKT: What we hear as "loudness" (say, a loud C note <)), or a faint C note) is comparable to what we see as "bright" (a bright yellow light , or a dim yellow light) - UKT101106

From: http://hyperphysics.phy-astr.gsu.edu/Hbase/hframe.html 080330

Loudness is sometimes referred to as sound intensity. That is not true. By "intensity" we mean the sound energy received by the ear per unit square area, and it can be measured accurately in decibels. Loudness, on the other hand, is a perceived quality (subjective term), and is not exactly measurable.

Sound loudness is a subjective term describing the strength of the ear's perception of a sound. It is intimately related to sound intensity but can by no means be considered identical to intensity. The sound intensity must be factored by the ear's sensitivity to the particular frequencies contained in the sound. This is the kind of information contained in equal loudness curves for the human ear. It must also be considered that the ear's response to increasing sound intensity is a "power of ten" or logarithmic relationship. This is one of the motivations for using the decibel scale to measure sound intensity. A general "rule of thumb" for loudness is that the power must be increased by about a factor of ten to sound twice as loud. To more realistically assess sound loudness, the ear's sensitivity curves are factored in to produce a phon scale for loudness. The factor of ten rule of thumb can then be used to produce the sone scale of loudness. In practical sound level measurement, the measuring instrument is made to approximate the human ear by use of what are known as filter contours such as A, B, and C.

A widely used "rule of thumb" for the loudness of a particular sound is that the sound must be increased in intensity (aka energy received/area) by a factor of ten for the sound to be perceived as twice as loud. A common way of stating it is that it takes 10 violins to sound twice as loud as one violin. Another way to state the rule is to say that the loudness doubles for every 10 phon increase in the sound loudness level. Although this rule is widely used, it must be emphasized that it is an approximate general statement based upon a great deal of investigation of average human hearing but it is not to be taken as a hard and fast rule.

Contents of this page

4. Timbre (aka sound quality)

Sounds may be generally characterized by pitch, loudness, and quality. However, the term "quality" can mean things other than that of sound, I prefer the term "timbre". Sound "quality" or timbre describes those characteristics of sound which allow the ear to distinguish sounds which have the same pitch and loudness. Timbre is then a general term for the distinguishable characteristics of a tone. Note, "tone" used here is not the same as that in a tonal language. Since Burmese is described sometimes described as a tonal language (now it is "pitch-register" language), this distinction is important. Timbre is mainly determined by the harmonic content of a sound (Remember the Fourier transforms, time domain and frequency domain?), and the dynamic characteristics of the sound such as vibrato and the attack-decay envelope of the sound.

UKT: Of course we would like to know what every term such as vibrato, and attack-decay envelope, means. But we must draw a line and take care not to get side-tracked.

Some investigators report that it takes a duration of about 60 ms to recognize the timbre of a tone, and that any tone shorter than about 4 ms is perceived as an atonal click. It is suggested that it takes about a 4 dB change in mid or high harmonics to be perceived as a change in timbre, whereas about 10 dB of change in one of the lower harmonics is required.

Contents of this page

5. Spectrum diagrams

UKT: My curiosity to know more than what Kevin Russell has given, has led me to: http://www.sjsu.edu/faculty/fry/123/acoustics.pdf  071222

Complex waves can be split up into the simpler waves that make them up. Let's suppose, we are to split up a complex wave (black) shown on the right which have been formed from 3 simple waves, the red, the blue and the green. And let's say, we have managed in one way or another, using mathematics, able to split it into its components. Now, we can describe the 3 simple waves in terms of their amplitudes. From their amplitudes and frequencies, we can draw a spectrum, where the Y-axis is the amplitude in decibels, and X-axis is the frequency in Hertz, Hz.

   

This kind of spectrum diagram is especially convenient for sound waves from a musical instrument, such as a flute. We have reproduced here a flute playing the middle C note.

[{UKT: Don't ask me what a "middle C note" is. I simply don't know. You will have to ask a music teacher. Since, the Burmese musical scale is different from the Western scale, make sure that your music teacher knows the Western scale.}]

The wave is made up of many, many simple waves as shown in the spectrum on the left. I hope you can hear what it sounds like by going online and clicking on A flute playing middle C:  <)). (from Kevin Russell's website).

(Sounds and plots for the musical instruments come from Geoffrey Sandell's SHARC Timbre database at Loyola University Chicago.)

 

Don't confuse these spectrum diagrams with spectrograms (which we'll cover later). Perhaps, we are more familiar with light spectrum, because we are used to seeing rainbows. The coloured diagram shows how white light is split by a glass prism into respective colours. There is only one major difference between sound waves and light waves. The sounds waves are longitudinal waves, whereas light waves are transverse waves. Mathematical treatment of the waves are the same. The set of frequencies in light wave (as separated by a prism) is called its spectrum.

The situation is similar with sound. The complex wave for {i} / i / vowel sound will be made up of one set of frequencies which are different for the set of frequencies for vowel {a} /a/.

We need a way to separate a complex sound wave out into its component frequencies (and their amplitudes) so that we can see what makes vowels different.

Contents of this page

6. Source-Filter model of speech production

Sound [of the language] is produced in the larynx. That is where the pitch and volume are manipulated. The strength of expiration [air breathed out]  from the lungs also contributes to loudness, and is necessary for the vocal folds to produce speech.
-- Wikipedia http://en.wikipedia.org/wiki/Larynx 070909

The complex waves produced during voiced periods of speech depend on two things:
1. the waves produced by the vocal fold vibrations (the source), and
2. the way those waves are modified by the higher parts of the vocal tract (the filter).

An important feature of the source are its harmonics. One of the most important ideas in understanding the filter is resonance.

We shall describe the Source-Filter Theory in the following steps:
Harmonics
Resonance | Resonance in a half-open tube
Source and filter

Contents of this page

7. Harmonics

Consider again the waves produced by the bass clarinet and the flute. The lines in these spectra look suspiciously evenly-spaced. This is a typical property of naturally occurring waves.

Consider a guitar string. It can vibrate in a simple back-and-forth motion. But it can also vibrate in more complex ways, where each half or third of the string is moving in the opposite direction from its neighbours. These are known as harmonics. Naturally occurring waves in a guitar string involve an infinite number of motions.

Each mode of vibration produces a simple wave with its own frequency and amplitude:

1. The frequency of the simple wave produced by the simplest back-and-forth motion is called the fundamental frequency.

2. The frequency of the simple wave produced by the second mode of vibration (where the string is vibrating in halves) is twice the fundamental frequency, or exactly one octave higher.

3. The frequency produced by the third mode (where the string is vibrating in thirds) is three times the fundamental frequency. etc.

Each of the higher-frequency simple waves is called a harmonic. In naturally occurring vibrations, there is a harmonic at each multiple of the fundamental frequency -- theoretically all the way up to infinity, though the harmonics decrease in amplitude as the frequency rises.

The spectrum (frequency) of the wave produced by the guitar string would look like Fig.6.05.

The wave produced by the vibration of the vocal cords also has this kind of structure. The wave produced by the vocal cords (before it is modified by the vocal tract) is often called the glottal wave. The fundamental frequency (the frequency of the lowest simple wave) is perceived as the pitch.

Contents of this page

8. Resonance

Objects have frequencies that they prefer to vibrate at. If you try vibrate it at a different frequency, the vibrations will be dampened and eventually die out. If you try to vibrate it at its preferred frequency, the vibrations will be reinforced and the object will resonate.

Some examples of resonance:

a standing wave on a skipping rope.
the note you get when you blow into a half-full glass bottle
the vibrations in the sounding board of a violin
a swing swinging higher when you push it just right, or you "pump" just right while you're sitting in it

As children learning Physics, we were often asked when we come to 'resonance': "Why do soldiers break step in marching over a bridge?" I am sure, you know the answer. If you don't, here it is:

"To avoid stressing the bridge excessively... if they march in step, there's a chance that their steps will coincide with the resonant frequency of the bridge and cause possibly dangerous amplified shaking of the whole structure."

The most famous case of a bridge collapse (due to resonance caused by the wind), that of the Tacoma Narrows bridge (nicknamed Galloping Gertie) in 1940, has been recorded on film. Various videos are available online. Use the search string "Tacoma Narrows bridge" to find one. (The most recent one I have watched was on 080311.)

Contents of this page

9. Resonance in a half-open tube

A tube that vibrates at one end and is open at the other (e.g., a clarinet, the vocal tract) also has preferred frequencies.

You can get a standing wave in a half-open tube if the area of high-pressure reaches the open end at exactly the same time the closed end returns to normal pressure.

When this happens, the "reflected" waves travelling back from the open end will exactly coincide with the waves travelling forward from the closed end and they will reinforce each other. The tube will resonate. (At a non-preferred frequency the backward-moving waves will sometimes reinforce, sometimes cancel out, the forward-moving waves, and you won't get a standing wave.)

 

The preferred frequencies for a half-open tube will be all those frequencies (call them X) such that: the length of the tube is 1/4 the wavelength of X, or the length of the tube is 3/4 the wavelength of X, or the length of the tube is 5/4 the wavelength of X, and so on. (This is often called the "odd-quarters law".) This means the second resonating frequency will be three times higher than the first, the next will be five times higher, and so on.

 

 

 

For a half-open tube that is 17 cm long (a typical length for an adult male's vocal tract), the preferred frequencies are 500 Hz, 1500 Hz, 2500 Hz, 3500 Hz, and so on.

We often diagram the frequency response curve of a tube. This shows for each frequency how much a tube would resonate if  you gave it vibrations at that frequency. The frequency response curve for a 17 cm long vocal tract held in neutral position (i.e., the position for schwa) looks like:

The frequency response curve shows how the vocal tract in neutral position would respond if you gave it various frequencies:

Contents of this page

10. Same vowel at different pitches

See
Source-Filter Theory of speech production in my notes.
(UKT: This section needs a rewrite.)

You might be asking what exactly is the source, and what are the filters.

The source is the human-voice producing air-stream coming out of the glottis. It can be laminar flow or turbulent flow. For simplicity sake, it has vibration signals which we have been calling frequency.

The filters are to be found in the oral- and nasal-tracks. They will modify the sound signal.
Here we will take vowel schwa /ə/ [as in {a.ni}] to describe the model.

The frequency response curve shows how the vocal tract in neutral position would respond if you gave it various frequencies (remember it is the air coming out of the glottis).

The spectrum of the glottal wave (source) shows what frequencies you're actually giving it:

Putting these together gives you the spectrum of the wave that comes out of the mouth (filter - tongue in neutral position) for a schwa /ə/:

The two aspects of the source/filter model are independent of each other.
You can speak different vowels with the same pitch. (The harmonics will remain the same distance apart, but the bumps will be in different places.)
You can speak the same vowel with different pitches. (The bumps and the overall shape of the spectrum remain the same, but the harmonics will be spaced differently, as shown below. The "schwa spectrum" is the one your ear will hear, when someone else is "singing" /ə/.

 We will take two examples, one from Hyperphysics
- http://hyperphysics.phy-astr.gsu.edu/Hbase/music/vowel2.html#c1 080103,
for American vowel <a> as in <father> (US <a> is similar to {a}; whereas British <a> is similar to {au}); and Canadian <i> from Kevin Russell ,
- http://www.umanitoba.ca/faculties/arts/linguistics/russell/138/sec4/diffpich.htm 080103

 

Contents of this page

The vowel /a/

UKT: Click on vow-a<)) to hear the sound. Both pix and sound from:
http://sail.usc.edu/~lgoldste/General_Phonetics/Index.html 101106

To explain how the ear can recognize a vowel sound as the same vowel, even though it is sounded at different pitches, the idea of vocal formants is invoked. This is data from Benade showing that an "Ah" vowel {a} involves a similar envelope of harmonics when sounded at different frequencies.

Stemple, et al., report a mean fundamental frequency for male voices of 106 Hz with a range from 77 Hz to 482 Hz. For female voices the mean was 193 Hz with a range from 137 Hz to 634 Hz. These averages were based on the production of a sustained vowel /a/ .

Contents of this page

The vowel /i/

UKT: Click on vow-i<)) to hear the sound. Both pix and sound from:
http://sail.usc.edu/~lgoldste/General_Phonetics/Index.html 101106

Now, let's take the most prominent vowel, the front vowel / i / corresponding to { i } (register #2 of {i.} {i} {i:}). The source is the same glottal wave but at different frequencies.

The 7 diagrams on the right are computer-generated spectrum diagrams of the vowel, /i/, you will hear. Each shows the vowel / i / ( {i}) sung at successively higher pitches.

Note how the distance between the harmonics increases as the pitch does, but the preferred resonating frequencies stay the same. The pitch has no effect on the preferred resonating frequencies of the vowel, though we can see that if the pitch gets high enough and the harmonics far enough apart, it can become very difficult to tell where the "bumps" are.

UKT: No matter how the pitch changes, you can still recognize it as the same vowel /i/ or { i }.

 

Contents of this page

The vowel /u/

UKT: Click on vow-u<)) to hear the sound. Both pix and sound from:
http://sail.usc.edu/~lgoldste/General_Phonetics/Index.html 101106

 

 

Contents of this page

11. Formants

For the purposes of distinguishing vowels from each other, we are more interested in the frequency response curves (indicating the preferred resonating frequencies of the vocal tract) rather than in the raw spectrum of the wave.

Each of the preferred resonating frequencies of the vocal tract (each bump in the frequency response curve) is known as a formant. They are usually referred to as F1, F2, F3, etc. For example, the formants for a typical adult male saying a schwa:

F1, first formant -- 500 Hz
F2, second formant -- 1500 Hz
F3, third formant -- 2500 Hz
...

By changing the vocal tract away from a perfect tube, you can change the frequencies that it prefers to vibrate at. That is, by moving around your tongue body and your lips, you can change the position of the formants.

Formants can be used to differentiate the vowels such as {o} and {au:}. These two vowels are of interest to my friend U Tun Tint and me, because MLC transcribes the Bur-Myan {au:} /[o]/and {o} as /[ou]/.

When I told him that in Romabama, the transliteration for is {o}, he said "that's how a man on the street {lam:pau-ka. lu} would do it." And he is right! It is usual for male Burmese friends of the same age to address each other using the prefix {ko} (such as how I address him -- {ko htwan: ting.}). If I were to write to him in English, I would address him as Ko Tun Tint. The explanation for how this confusion had come about is on the way the English vowels /o/ and /ɑ/ are generally pronounced. The first three formants for /o/ and /ɑ/ are quite similar, and when we pronounce {au:} or {AU:}, foreigners might heard it as /o/. But to us, they sound as /ɑ/, and hence the Romabama transcription is {au:}. The reason is given below:

The following is from: LING 520 Introduction to Phonetics I, Liberman & Yuan
html form of file http://www.ling.upenn.edu/courses/Fall_2005/ling520/lectures/lecture7/lecture7.ppt. (download date: not recorded)

Quantal theory:
The mapping between articulation and acoustics is nonlinear: Certain, relatively large changes in articulation will cause little change in the acoustic signal, while other, relatively small changes in articulation will cause large changes in the acoustic signal.
The point vowel, /i/, /a/ and /u/ are produced at places in the vocal tract where small perturbations in articulation produce only minimal changes in the resulting formant frequencies. That is, these vowels  are assumed to be quantal vowels

Theory of adaptive dispersion:
[i, a, u] are at the extremes of the physiologically possible vowel space. So they are maximally acoustically distinct and are unlikely to be confused by a listener.
Listeners abilities to hear vowel distinctions provide a selectional pressure on segment inventories.

The following is from The Evolution of Human Speech: Its Anatomical and Neural Bases, by P. Lieberman, in Current Anthropology, vol. 48, no. 1, Feb 2007.

Abstract:  Human speech involves species‐specific anatomy deriving from the descent of the tongue into the pharynx. The human tongues shape and position yields the 1:1 oral‐to‐pharyngeal proportions of the supralaryngeal vocal tract. Speech also requires a brain that can reiterate freely reorder a finite set of motor gestures to form a potentially infinite number of words and sentences. The end points of the evolutionary process are clear. The chimpanzee lacks a supralaryngeal vocal tract capable of producing the quantal sounds which facilitate both speech production and perception and a brain that can reiterate the phonetic contrasts apparent in its fixed vocalizations. The traditional Broca‐Wernicke brain‐language theory is incorrect; neural circuits linking regions of the cortex with the basal ganglia and other subcortical structures regulate motor control, including speech production, as well as cognitive processes including syntax. The dating of the FOXP2 gene, which governs the embryonic development of these subcortical structures, provides an insight on the evolution of speech and language. The starting points for human speech and language were perhaps walking and running. However, fully human speech anatomy first appears in the fossil record in the Upper Paleolithic (about 50,000 years ago) and is absent in both Neanderthals and earlier humans.

This is the reason why when we are comparing languages, we usually look at the three vowels: //, /i/, and /u/.

Contents of this page

12. Canadian-English vowels

Dictionaries usually tell us how to pronounce a word, a vowel or a consonant. For instance, DJPD16 on the inside of the first cover states, on British accent, "e as in 'pet'; as in 'pat'. On American accent, it gives the same statements. Obviously, the authors have in mind that the reader would be a either a British or American person who knows what RP (Received Pronunciation) and/or Standard American accents are. Though I am bilingual, I am neither British-born nor American-born and I am at a loss to what the dictionary meant. I know how to pronounce 'pet' and 'pat' in English as is spoken in Myanmar. Shall we call it the Burmese-English? We pronounce the words in the same way:  the result is, I don't know how to differentiate the two. I am, what might call "phoneme deaf": parallel to a 'colour-blind' person who could not differentiate, the 'red-yellow-green' of the traffic light. When the top light lights up, even though it appears a kind of gray, he "knows" the colour is what others call 'red' and he has to stop. When the bottom light lights up, he "knows", the colour is 'green', and he can go. So my recourse is to rely on vowel diagrams and consonants charts, and use my knowledge of Burmese to pronounce the English words. Luckily, Burmese-Myanmar is based on phonemic principles, and I can use it as a phonetic-language even though I did not know what IPA was. So, let me give you once again the vowel charts: on the right, the vowel quadrilateral of Daniel Jones, and on the left, the vowel rectangle of the American tradition.

Before we proceed, I would like to remind you, what I have found so far on the equivalence of Burmese to English vowels:

{a} => /a/, // and /ə/
{i} => /i/
{u} => /u/
{au} => /ɔ/ and /ɑ/

Please note that Romabama {o} is /o/ -- not the o of Pali-Latin (International Pali). What I have found should be checked with the value of F2/F1, when these became available.

In the rectangular vowel diagram, I have marked out the Lax and Tense vowels. To be in conformity with Romabama requirements, I have changed the terms "lax" with "checked", and "tense" with "free". The checked vowels (e.g. /ɪ/ in <bit> /bɪt/ -- proposed transliteration {bt} (rhyming with {hkt} -- meaning: "times, era" MEDict064 are inside the red rectangle, and the free vowels (e.g. /i/ in <beat>  /biːt/ {bi:t}) outside it. You might also note that the "close" of the quadrilateral is the "high" of the rectangular. And, the German linguistics call tense-lax distinction fortis and lenis. See what the DJPD16 Info-panel 38 has to say. No wonder, I was confused! For use in Romabama, neither pair (Lax-tense) nor (fortis-lenis) makes sense. Instead, I prefer to use checked vowels (vowels followed by {a.tht} consonants), and free vowels. Since checked vowels are the equivalents of Russell's lax vowels, and free vowels the equivalents of tense vowels, I have to changed Russell's terms to Romabama's.

DJPD16-310 Info-panel 38: It is mainly American phonologists who use the terms lax and tense in describing English vowels; the short vowels /ɪ e ʌ ɒ ʊ ə/ are classed as lax, while what are referred to in our description of BBC pronunciation as the long vowels and the diphthongs are tense. The terms can also be used of consonants as equivalent to FORTIS (tense) and LENIS (lax), though this is not commonly done in present-day descriptions.

[ i ] in <beat> /biːt/ DJPD16-052; approximating {bi:t}
[ɪ] in <bit> /bɪt/ -- DJPD16-060; approximating {bt} (from the spelling of {hkt} - MEDict064)

Note: Spellings like {hkt} /kʰɪt/ (meaning "age, time, period", and {t~ta} (meaning "box") are becoming rare. We should consider whether they should be revived for new and imported words because split vowels are not easy to write. I am not suggesting that well established words like {hkt} should be written as {hkic} /kʰɪc/ because it would give another meaning. We should also note the possibility of mis-spelling in Romabama as {hkis} which would imply the /s/ sound at the syllable end. To prevent it, we might use a double killed consonant in the coda which is not commonly allowed by Burmese-Myanmar phonotactics. However, according to my good friend U Tun Tint it may be allowed. Therefore, spellings like {hkict} (for non-fricatives) and {kiss}/{kiS} would be provisionally used in Romabama. 

Spectrum of free (tense) vowels (right):
http://www.umanitoba.ca/faculties/arts/linguistics/russell/138/sec4/formants.htm 080103

You will notice that I have included // in the "free vowels". This is not exactly right because it is usually followed by a consonant, and is therefore a checked vowel.

Each of the figures on the upper-right shows a computer-generated spectrum and response curve for a particular utterance of a Canadian English vowel by an adult male. The jagged lines show the harmonics. The curved line is the computer's guess, based on the harmonics in the spectrum, as to what the frequency response curve of the vocal tract must have been. The frequencies of the first two formants (as guessed by the computer) have been given for each vowel.

(If you have noticed that the Frequency axis for /o/ is 0 to 5000 Hz, please be assured that it was 0-5000 in the original given by Kevin Russell.)

For comparing vowels across languages, it is now accepted to take the F2/F1 of a minimum of three vowels: [a i u]. The values of F2/F1 taken from the figure on right are:
[] = 1550/860
[ i ] = 2230/280
[u] = 1260/330

Spectrum of checked (lax) vowels - left:
(checked vowels are those that are followed by  consonants or {a.t}-consonants. It is unfortunate that Russell does not mention the consonant following his "lax" vowels.)

Kevin Russell gives a total of 11 spectra: 6 free vowels, and 4 checked vowels.

When you compare the vowels (in terms of F2/F1), the vow /ʌ/ (1310/680) seems to be out of place when I included it in the checked vowels. The vowel /u/ (1260/330), behaves very well in Burmese-Myanmar when it is not followed by a killed consonant. However, when followed by a killed consonant it changes into /ʊ/ F-400-110, and /ʌ/ F-680-1310, the pronunciation of the word changes:

   <put> /pʊt/ -- DJPD16-436 compare with {pwat}
   <but> /bʌt/ -- DJPD16-075 {bt}

We note that the killed {ta.} is of row 4 of the akshara. With others:

   <piss> /pɪs/ -- DJPD16-413 compare with {piS}
   <pack> /pk/ -- DJPD16-392 compare with {pak}

 

Contents of this page

13. Phonation types or voice types

in EGG (electroglottography) and Voice Quality 4. Labelling of voice quality in
- http://www.ims.uni-stuttgart.de/phonetik/EGG/frmst1.htm (latest accessed 160111).
You will hear some samples in wav. format .

EGG is a an noninvasive method of investigating of laryngeal behaviour, conveys essential information about glottal activity. This study provides a objective, computer-supported method of EGG signal description which can be used for the automatic determination of voice quality for normal and pathological speakers and in determination of laryngeal settings used for linguistic purposes.

Voice quality can be judged from:
the degree of hoarseness (G ) or (H ), amount of noise in the produced sound
the grade of roughness (R ), in relation to the irregular fluctuation of the fundamental frequency
grade of breathiness (B ), the fraction of the non-modulated turbulence noise in the produced sound
asthenicity (A ), the overall weakness of voice
"strained quality" (tenseness of voice, overall muscular tension) (S )

Using the above subjective qualities in a scale of 0 to 3, phoneticians have devised 2 systems of classification: GRBAS or RBH. GRBAS is widely used in the US and Japan, whilst RBH is used in Europe. Listen to a voice graded to R3B2H3 <)) 

UKT: The following is from II Electroglottography in
- http://www.ims.uni-stuttgart.de/phonetik/EGG/frmst2.htm (160111)

Electroglottography (EGG) is a technique used to register laryngeal behavior indirectly by a measuring the change in electrical impedance across the throat during speaking. The method was first developed by Fabre (1957) and influential contributions are credited to Fourcin (1971 with Abberton) and Frokjaer-Jensen (1968 with Thorvaldsen). Commercially available devices are produced by Laryngograph Ltd., Synchrovoice and F-J Electronics.

Pix right: The Laryngograph Processor

A portable electro-laryngograph, microphone pre-amplifier, and speech or Laryngograph based fundamental frequency ("pitch") extractor www.laryngograph.com/pdfdocs/lxprocfsheetusb.pdf . The unit consists of: 1.A single pair of gold plated, guard-ring electrodes and three differently sized neck bands. For work with either voice or swallowing the electrodes are lightly held on the speakers neck, either side of the thyroid cartilage. They enable the Processor to detect the small, relatively rapid variations in the conductance of the tissue separating them, produced by changes in the nature and area of vocal fold and other tissue contact. (Three different sizes of electrodes are optionally available, for special applications.) 2. A miniature high quality electret microphone responding to the speech pressure waveform. 3. A power supply/battery charger.

 The amplitude of the signal changes because of permanently varying vocal fold contacts. It depends on:
the configuration and placement of the electrodes
the electrical contact between the electrodes and the skin
the position of the larynx and the vocal folds within the throat
the structure of the thyroid cartilage
the amount and proportion of muscular, glandular and fatty tissue around the larynx
the distance between the electrodes.

Contents of this page

14. IPA vowel diagram and cardinal vowels

In the beginning of our forays into phonetics and linguistics, the more my wife and I (we were both chemists by training) looked at the IPA vowel diagram with the cardinal vowels, the more we became curious of the experimental procedures that must have been carried out to place a person's vowels (i.e. vowels uttered by a particular human subject, male and female) in the diagram. How was it done by Daniel Jones? Were there experiments carried out in the field of acoustic phonetics? And would I be able to understand the mathematics involved? The physical scientist in me, wouldn't let me rest until I have at least a cursory look into what I wanted to know. Browsing the internet, I came across An IPA vowel diagram approach to analysing L1 effects on vowel production and perception, by O. I. Dioubina & H. R. Pfitzinger, Univ. of Munich, 200. www.phonetik.uni-muenchen.de/~hpt/pub/DioubinaPfitzinger_ICSLP02.pdf . 071231

The IPA vowel diagram represents an abstract space, which in its layout and proportions is derived from the one which had been used in the cardinal vowel system of Daniel Jones. It is a trapezium, right angles at top and bottom back and ratio 2:3:4 (base:back:top). This is the most simplified version of the figure developed by Jones through a number of stages, in which articulatory accuracy was progressively sacrificed for practical convenience in drawing the diagram.

The vowels are plotted on the diagram with reference to certain fixed points. Daniel Jones proposed a series of 8 (primary) cardinal vowels spaced around the outside of the possible vowel area and designed to act as fixed reference for phoneticians. The space within the diagram represents a continuum of possible vowel qualities which have to be identified by their relationships to the cardinal vowels. According to Daniel Jones a scale of these 8 cardinal vowels forms a convenient basis for describing the vowels of any language.

UKT: At the present, it is accepted that only a minimum three vowels /a/, /i/, /u/ (or in the case of English [, i, u]), known as a vowel triangle are needed for cross-language comparison. For German (of the sample), imagine a triangle being drawn across three filled dots: /i/, /a/ and /u/. For Russian, imagine another triangle being drawn across open dots for the same vowels. (Incidentally, I could not find the one for /a/.). Since the two triangles are different, we can see why a German would not able to sound like a Russian or vice versa. Remember there is no perfect way to say a vowel (such as /i/): the IPA pronunciation by an American phonetician such as Ladefoge is as 'good' as the one you say. I have come to this conclusion after listening to the "IPA pronunciations" given by different people (American, British, Canadian, Dutch, etc.). Make sure that the aim of the L2 (second language) should be able to speak English which could be understood by the other English speakers. The aim should not be to speak like the so-called native speaker, whatever the word "native" may mean.

The two languages I am interested in, are Burmese and English. I am sure English must have been studied using subjects from different English-speaking countries, but I am doubtful much has been done with Burmese-Myanmar subjects.

The description of vowel qualities with the help of the vowel diagram requires a phonetician to be able to position them as certain points on the diagram. The three basic dimensions, height, backness and rounding, together with the values of cardinal vowels are involved in making a decision on the position of the vowel quality within the space of the diagram. Where a vowel is positioned would be bound to be influenced by the L1 of the investigating phoneticians. Because of this, I doubt the descriptions of the Western phoneticians on the qualities of the Burmese-Myanmar vowels, especially when they insist that Burmese-Myanmar has diphthongs. 

Kevin Russell gives the Canadian vowels from a study of formants (values, indicated by symbol F, observed from a study of sound waves and spectrum on a subject -- most probably himself.

The figure on the left is the result of "flipping" the figure given by Kevin Russell. I've redrawn it adding the blue lines. Click on the figure to see the original graph. The measurement, and the construction of F2/F1 tells us the positions of tongue of a particular speaker on a particular occasion without having to rely on the judgments of the phoneticians. Remember, the same person would pronounce his or her vowels slightly differently from time to time (depending whether he or she is suffering from a cold, etc.). This you must remember when you are speaking about the sounds of the vowels in particular, and languages in general. Yet, you as a member of a particular linguistic group, would be able to identify another member of the same group from the way he or she speaks. This ability to identity another person as the same kind or not is important for the survival of the human species.

The nearest to Burmese-Myanmar vowels I could find (on the internet, so far, 080103) are Bangla vowels presented on the right. The reader should note that the F scales of the Bangla vowels are the reverse of Russell. The Burmese-Myanmar nearest equivalents of the Bangla vowels are:

{a.} = অ ; {a} = আ ; {}  = এ ; {i.} =  ই  ; {u.} = উ ; {au:} = ও

You will notice that the vowel-quadrilateral can be bounded by a rectangle with F1 as the Y-axis (from 900 to 300), and F2 as the X-axis (from 2500 to 500).

Comparing the vowel quadrilateral and F2/F1 diagram shows that:
F1 is influenced by tongue body height
F2 is influenced by tongue body front-ness/back-ness.

It was expected that measurement of F1 and F2 would be sufficient to describe a vowel, however, for finer details especially when we are taking Burmese and English together, it is found that F3 would have to be taken into consideration. Now, let's find out what the F1's F2's and F3's are, and we will see how they are used. If you are in a hurry to know jump to the section Relating formants to articulation (on the same file). However, if you are a new comer, I would recommend that you read the following sections first.

Contents of this page

15. Relating formants to articulation

UKT: Materials in this section should be read together with Vowel Perception and Production, by B.S. Rosner & J.B. Pickering, Oxford Psychological series 23, Oxford Science Publications, Oxford University Press, published 1997. Available (from Google bookreview) in TIL library in the CD version, for research purposes only.

The positions for the first two formants of a vowel aren't random. Let's look more closely at the formants we saw for Canadian English vowels: (UKT: the values below are from Russell: http://www.umanitoba.ca/faculties/arts/linguistics/russell/138/sec4/form2.htm 080103)

The values of F2/F1 are the same as those on the figures given in the previous section:
[] = 1550/860 [i] = 2230/280 [u] = 1260/330. However, they are different from those given by Wikipedia (below). The reason is each of us pronounce our vowels slightly differently. But we all do it within a small range, so people in one linguistic group knows exactly what his neigbour is saying. If you travel from country to country as I have done (Australia, Britain, Canada, and US where I met "native-English speakers") you will find that you need a couple of days to cue in to the way the "locals" speak. It has been observed that people of the same L1 (say natives of the Indian subcontinent) when they speak English as their L2, the speak in the same way. So an Indian speaking English is perfectly understood by another Indian, but not easily by an American or a Britisher.

UKT: The following values are from: http://en.wikipedia.org/wiki/Formant download 070908

After measuring the F2/F1 of individuals' vowels, and "averaging" them, we can place each vowel on a graph, where the horizontal dimension represents the frequency of the first formant (F1) and the vertical dimension represents the frequency of the second formant (F2). See figure on right.

UKT: I've redrawn the graph adding the blue lines. Though Kevin Russell had not indicated the the formants for a typical adult male saying a schwa. I've entered it in red.

What we get is just a image similar to our familiar vowel chart! (upper-right -click on fig. to see the downloaded original pix.) If we change the axes of the graph so that the horizontal dimension shows (decreasing) F2 and the vertical dimension  shows (decreasing) F1, we get something almost exactly like our vowel chart. See figure on the left.

The figure on the left is the result of "flipping" the figure above. I've relabeled the blue lines, and the data points. The figure on the lower-right is the fig. downloaded from the source on 070907.

This means that a listener can essentially "hear" the position of the speaker's tongue body.
F1 is influenced by tongue body height
F2 is influenced by tongue body frontness/backness

(An even more accurate indicator of frontness/backness than F2 is the difference between the first two formants, i.e., F2 - F1.).
Instead, of doing this, I have reproduced the graph (on British vowels)
from http://www.phon.ucl.ac.uk/home/wells/formants/relamp-uni.htm in my notes. See British-vowels in my notes.

As a concluding remark, after going through all the above, I would like to add:
From measurements of F2/F1, across languages, humans (Americans, Bengalis, British, Burmese, Canadians, Germans, Russians, etc.) all pronounce the three vowels [a (), i, u] slightly differently. Still we can recognize the vowels produced. Moreover, even among the same linguistic group, men, women, and children produce the same vowels differently, but still members of the same linguistic group understand each other perfectly. This is because, not only the speaker modulate their speech during production, but the hearers cue  what they hear, by sight as well. And, so our understanding of human speech is never complete unless we look into the perception as well. But then, that would take us into a different field of study, where we will come across another theory, the Modulation Theory. As an introduction to the theory, please look into Speech considered as modulated voice, by Hartmut Traunmller , Department of Linguistics, Stockholm University, S-106 91 Stockholm  .
_
http://www.ling.su.se/staff/hartmut/speech_considered.pdf (2005, 160111)),
parts of which are included in the TIL library. Abstract | pdf 0712116

Contents of this page

UKT notes

Body-size effects on voice acoustics

From: Pitch (F0) and formant profiles of human vowels and vowel-like baboon grunts: The role of vocalizer body size and voice-acoustic allometry - by D. Rendall et. al., and P. Lloyd., in J. Acoust. Soc. Am., Vol. 117, No. 2, February 2005. pages 944-955 .

Abstract: Key voice features fundamental frequency (F0) and formant frequencies can vary extensively between individuals. Much of the variation can be traced to differences in the size of the larynx and vocal-tract cavities, but whether these differences in turn simply reflect differences in speaker body size (i.e., neutral vocal allometry) remains unclear. Quantitative analyses were therefore undertaken to test the relationship between speaker body size and voice F0 and formant frequencies for human vowels. To test the taxonomic generality of the relationships, the same analyses were conducted on the vowel-like grunts of baboons, whose phylogenetic proximity to humans and similar vocal production biology and voice acoustic patterns recommend them for such comparative research. For adults of both species, males were larger than females and had lower mean voice F0 and formant frequencies. However, beyond this, F0 variation did not track body-size variation between the sexes in either species, nor within sexes in humans. In humans, formant variation correlated significantly with speaker height but only in males and not in females. Implications for general vocal allometry
are discussed as are implications for speech origins theories, and challenges to them, related to
laryngeal position and vocal tract length. 2005 Acoustical Society of America.
[DOI: 10.1121/1.1848011]

Go back body-size-note-b

Contents of this page

British-English vowels

From: Wells, A study of the formants of the pure vowels of British English -- Submitted in 1962, in partial fulfilment of the requirements for the degree of M.A., University of London.
  http://www.phon.ucl.ac.uk/home/wells/formants/relamp-uni.htm 080101

"As, it seems, with most acoustic vowel parameters, it is possible to manipulate the figures in such a way as to show correlation with tongue height and yield an acoustic triangle similar to the familiar auditory-articulatory triangle -- in this case by plotting as the ordinate the difference in amplitude between F1 and F2, with the abscissa arbitrarily arranged to bring out the similarity (fig. 7). In other words, high tongue position corresponds to an F2 of much less intensity than F1, while low tongue position corresponds to an F2 of intensity similar to that of F1. As far as /ʌ/ is concerned, this amplitude triangle gives a better positioning (that is, a positioning more like the auditory-articulatory positioning) than any other acoustic plot, not excluding the frequency plot of F1 versus F2. "

UKT: Note the Wells' usage "auditory-articulatory triangle". I presume he meant the familiar vowel quadrilateral of Daniel Jones.

Go back brit-vow-b

Contents of this page

Lax and Tense vowels

From: Wikipedia, http://en.wikipedia.org/wiki/Tenseness download 070906

UKT: German linguistics call the distinction fortis and lenis (online: fortis and lenis) rather than tense and lax.
For Romabama, the terms checked (followed by killed consonants) and free vowels are preferable.

Tenseness: In phonology, tenseness is a particular vowel or consonant quality that is phonemically contrastive in many languages, including English. It has also occasionally been used to describe contrasts in consonants. Unlike most distinctive features, the feature [tense] can be interpreted only relatively, that is, in a language like English that contrasts [ i ] (e.g. <beat> ) and [ ɪ ] (e.g. <bit>), the former can be described as a tense vowel while the latter is a lax vowel. Another example is Vietnamese, where the letters ă and represent lax vowels, and the letters a and ơ the corresponding tense vowels. Some languages like Spanish are often considered as having only tense vowels, but since the quality of tenseness is not a phonemic feature in this language, it cannot be applied to describe its vowels in any meaningful way.

Comparison between tense and lax vowels: In general, tense vowels are more close (and correspondingly have lower first formants) than their lax counterparts. Tense vowels are sometimes claimed to be articulated with a more advanced tongue root than lax vowels, but this varies, and in some languages it is the lax vowels that are more advanced, or a single language may be inconsistent between front and back or high and mid vowels (Ladefoged and Maddieson 1996, 3024). The traditional definition, that tense vowels are produced with more "muscular tension" than lax vowels, has not been confirmed by phonetic experiments. Another hypothesis is that lax vowels are more centralized than tense vowels. There are also linguists who believe that there is no phonetic correlation to the tense-lax opposition.
   In many Germanic languages, such as RP English, standard German, and Dutch, tense vowels are longer in duration than lax vowels; but in other languages, such as Scots, Scottish English, and Icelandic, there is no such correlation.
   Since in Germanic languages, lax vowels generally only occur in closed syllables, they are also called checked vowels, whereas the tense vowels are called free vowels as they can occur at the end of a syllable.

Tenseness in consonants: Occasionally, tenseness has been used to distinguish pairs of contrasting consonants in languages. Korean, for example, has a three-way contrast among stops; the three series are often transcribed as [p t k] - [pʰ tʰ kʰ] - [pʼ tʼ kʼ]. The contrast between the [p] series and the [pʼ] series is sometimes said to be a function of tenseness: the former are lax and the latter tense. In this case the definition of "tense" would have to include greater glottal tension.
   In some dialects of Irish and Scottish Gaelic, contrasts are found between [l, lj, n, nj] on the one hand and [ɫˑ, ʎˑ, nˠˑ, ɲˑ] on the other hand. Here again the former set have sometimes been described as lax and the latter set as tense. It is not clear what phonetic characteristics other than greater duration would be associated with tenseness in this case.
   Some researchers have argued that the contrast in German traditionally described as voicing ([p t k] vs. [b d g]) is in fact better analyzed as tenseness, since the latter set is voiceless in Southern German. German linguistics call the distinction fortis and lenis rather than tense and lax. Tenseness is especially used to explain stop consonants of the Alemannic German dialects because they have two series of them that are identically voiceless and unaspirated. However, it is debated whether the distinction is really a result of different muscular tension, and not of gemination.

Go back lax-tense-vow-b

Contents of this page

Place Theory of Hearing

From: http://en.wikipedia.org/wiki/Place_theory 101106

Place theory is a theory of hearing which states that our perception of sound depends on where each component frequency produces vibrations along the basilar membrane. By this theory, the pitch of a musical tone is determined by the places where the membrane vibrates, based on frequencies corresponding to the tonotopic organization of the primary auditory neurons. [1] [2]

More generally, schemes that base attributes of auditory perception on the neural firing rate as a function of place are known as rateplace schemes. [3]

The main alternative to the place theory is the temporal theory, [2] also known as timing theory. [1] These theories are closely linked with the volley principle or volley theory, [4] a mechanism by which groups of neurons can encode the timing of a sound waveform. In all cases, neural firing patterns in time determine the perception of pitch. The combination known as the placevolley theory uses both mechanisms in combination, primarily coding low pitches by temporal pattern and high pitches by rateplace patterns. [4] It is now generally believed that there is good evidence for both mechanisms. [5]

The place theory is usually attributed to Hermann Helmholtz, though it was widely believed much earlier. [6] [7]

Go back Place-Th-note-b

Contents of this page

quantal vowel

From: A Dictionary of Phonetics and Phonology cited by
http://www.bookrags.com/tandf/quantal-vowel-tf/ 100414

n. A vowel whose acoustic qualities are little affected by variation in its articulation and whose perception is little affected by variation in its acoustic quality: one of /i/ /u/ /a/. Stevens (1972).

Go back quantal-vow-note-b

Contents of this page

Source-Filter Theory

From:
Wikipedia http://en.wikipedia.org/wiki/Source-filter_model_of_speech_production 071224
Robert M. Krauss, http://www.columbia.edu/itc/psychology/rmk/T2/sf_theory.html 071224

From Wikipedia

The source-filter model of speech production models speech as a combination of a sound source, such as the vocal cords, and a filter, the vocal tract (and radiation characteristic).

While only an approximation, the model is widely used in a number of applications because of its relative simplicity. To varying degrees, different phonemes can be distinguished by the properties of their source(s) and their spectral shape. Voiced sounds (e.g., vowels) have (at least) a source due to (mostly) periodic glottal excitation, which can be approximated by an impulse train in the time domain and by harmonics in the frequency domain, and a filter that depends on, e.g., tongue position and lip protrusion. On the other hand, fricatives have (at least) a source due to turbulent noise produced at a constriction in the oral cavity (e.g., the sounds represented by orthographically by "s" and "f"). So called voiced fricatives (such as "z" and "v") have two sources - one at the glottis and one at the supra-glottal constriction.

The source-filter model is used in both speech synthesis and speech analysis, and is related to linear prediction. The development of the model is due, in large part, to the early work of Gunnar Fant, although others, notably Ken Stevens, have also contributed substantially to the models underlying acoustic analysis of speech and speech synthesis.

From: R.M. Krauss

As the figure below illustrates, the vibrations of the vocal folds are the source of speech. The buzzing produced these vibrations is passed through the vocal tract, which serves as a resonant filter, damping certain frequencies and intensifying others. The result is the characteristic sound we identify as speech.

To hear what the buzzing of the vocal folds sounds like before it enters the vocal tract, click the icon labeled excitation below or <)).

To hear the filtering action of the vocal tract, click on vocal tract filter <))

To hear the resultant speech, click on speech <)).

You can also click on the following links to online source:
<)) excitation <)) vocal tract filter <)) speech

First proposed by Johannes Meller in the 19th century, source-filter theory accounts for the acoustic properties of what are called "voiced" speech sounds (sounds during whose articulation the vocal chords vibrate). For "unvoiced" sounds (e.g., Shh), the source is air forced through a constriction in the vocal tract. The sentence you hear when you click on the speech icon [{<)) Why were you away a year ago, Roy? (this is what I heard -- UKT}] is composed almost entirely of voiced speech sounds. This is shown in the speech spectrogram at the bottom of the page [{now on left}], which plots the distribution of acoustic energy by frequency over time -- the darker the region, the greater the intensity of the acoustic energy in that region. Notice that the bands of acoustic energy are nearly continuous, especially in the lower frequencies. This is unusual in speech and results from the fact that the utterance contains voiced sounds almost exclusively. The one discontinuity, about two-thirds of the way through the utterance, reflects articulation of the g in <ago>, where the passage of air is momentarily interrupted and the released in a burst.

The dark bands in the spectrogram are called formants, and reflect the acoustic energies that remain after the filtering action of the vocal tract. The three figures below (taken from Miller) illustrate how different configurations of the vocal tract selective pass certain frequencies and not others. The first shows the configuration of the vocal tract while articulating the phoneme [i] as in the word "beet," the second the phoneme [a], as in <father> [{the <a> in <father> is pronounced British English as /ɑ/ -- DJPD16-199}], and the third [u] as in <boot>. Note how each configuration uniquely affects the acoustic spectrum -- i.e., the frequencies that are passed.

 

UKT: The minimum three, [a, i, u], to characterize a language (e.g., American English) are given above. For the British, [, i, u] have to be given.

Go back source-filter-th-note-b

Contents of this page

timbre (aka quality of sound)
and harmonic content

From: HyperPhysics http://hyperphysics.phy-astr.gsu.edu/Hbase/hframe.html 080330

Sounds may be generally characterized by pitch, loudness, and timbre (aka quality). Sound "quality" or "timbre" describes those characteristics of sound which allow the ear to distinguish sounds which have the same pitch and loudness. Timbre is then a general term for the distinguishable characteristics of a tone. Timbre is mainly determined by the harmonic content of a sound and the dynamic characteristics of the sound such as vibrato and the attack-decay envelope of the sound.

Some investigators report that it takes a duration of about 60 ms to recognize the timbre of a tone, and that any tone shorter than about 4 ms is perceived as an atonal click. It is suggested that it takes about a 4 dB change in mid or high harmonics to be perceived as a change in timbre, whereas about 10 dB of change in one of the lower harmonics is required.

The primary contributers to the quality or timbre of the sound of a musical instrument are harmonic content, attack and decay, and vibrato. For sustained tones [{such as a vowel sound}], the most important of these is the harmonic content, the number and relative intensity of the upper harmonics present in the sound.

Some musical sound sources have overtones which are not harmonics of the fundamental. While there is some efficiency in characterizing such sources in terms of their overtones, it is always possible to characterize a periodic waveform in terms of harmonics - such an analysis is called Fourier analysis. It is common practice to characterize a sound waveform by the spectrum of harmonics necessary to reproduce the observed waveform.

The recognition of different vowel sounds of the human voice is largely accomplished by analysis of the harmonic content by the inner ear. Their distinctly different quality is attributed to vocal formants, frequency ranges where the harmonics are enhanced.

Go back timbre-note-b .

Contents of this page
End of TIL file.