Update: 2003-07-16 10:11 AM -0400

TIL

SAMPA Computer Readable
Phonetic Alphabet

Speech Assessment Methods Phonetic Alphabet (SAMPA) www.phon.ucl.ac.uk/home/sampa/home.htm

Downloaded and edited by U Kyaw Tun for students of TIL Computing and Language Centre, Yangon, Myanmar. Not for sale

This is an extract of the SAMPA index page from TIL database.

Top

SAMPA (Speech Assessment Methods Phonetic Alphabet) is a machine-readable phonetic alphabet. It was originally developed under the ESPRIT project 1541, SAM (Speech Assessment Methods) (Internet link: SAM) in 1987-89 by an international group of phoneticians, and was applied in the first instance to the European Communities languages Danish, Dutch, English, French, German, and Italian (by 1989); later to Norwegian and Swedish (by 1992); and subsequently to Greek, Portuguese, and Spanish (1993). Under the BABEL project, it has now been extended to Bulgarian, Estonian, Hungarian, Polish, and Romanian (1996). Under the aegis of COCOSDA it is hoped to extend it to cover many other languages (and in principle all languages). On the initiative of the OrienTel project, Arabic, Hebrew, and Turkish have been added. Other recent additions: Cantonese, Croatian, Czech, Russian, Slovenian, Thai (Internet link: Thai). Coming shortly: Japanese, Korean.

Unless and until ISO 10646/Unicode is implemented internationally, SAMPA and the proposed X-SAMPA (Extended SAMPA) constitute the best international collaborative basis for a standard machine-readable encoding of phonetic notation.

Note about Unicode: Recent version of the Internet Explorer and Netscape browsers are capable of handling WGL4, the subset of Unicode needed for the orthography of all the languages of Europe. Test yours by looking at the Unicode Test page (Internet link: this ), or download an up-to-date browser and a WGL4 font. Unicode SAMPA pages are now available with correct local orthography, for those with this capacity, for Bulgarian, Czech, Greek, Hungarian, Polish, Romanian, and Slovenian. See if your browser can cope with Unicode IPA symbols by looking at this special version of the English SAMPA page (Internet link: English ). See IPA in Unicode (Internet link: here).

SAMPA basically consists of a mapping of symbols of the International Phonetic Alphabet onto ASCII codes in the range 33..127, the 7-bit printable ASCII characters. Associated with the coding (mapping) are guidelines for the transcription of the languages to which SAMPA has been applied. Unlike other proposals for mapping the IPA onto ASCII, SAMPA is not one single author's scheme, but represents the outcome of collaboration and consultation among speech researchers in many different countries. The SAMPA transcription symbols have been developed by or in consultation with native speakers of every language to which they have been applied, but are standardized internationally.

A SAMPA transcription is designed to be uniquely parsable. As with the ordinary IPA, a string of SAMPA symbols does not require spaces between successive symbols.

SAMPA has been applied not only by the SAM partners collaborating on EUROM_1 (Internet Link:  EUROM 1), but also in other speech research projects (e.g. BABEL, Onomastica, OrienTel) and by Oxford University Press. It is included among the resources listed by the Linguistic Data Consortium.

In its basic form SAMPA was seen as catering essentially for segmental transcription, particularly of a traditional phonemic or near-phonemic kind. Prosodic notation was not adequately developed. This shortcoming has now been remedied by a proposed parallel system of prosodic notation, SAMPROSA. It is important that prosodic and segmental transcriptions be kept distinct from one another, on separate representational tiers (because certain symbols have different meanings in SAMPROSA from their meaning in SAMPA: e.g. H denotes a labial-palatal semivowel in SAMPA, but High tone in SAMPROSA).

A proposal for an extended version of the segmental alphabet, X-SAMPA, extends the basic agreed conventions so as to make provision for every symbol on the Chart of the International Phonetic Association, including all diacritics. In principle this makes it possible to produce a machine-readable phonetic transcription for every known human language.

The present SAMPA recommendations (as devised for the basic six languages) are set out in the following table. All IPA symbols that coincide with lower-case letters of the Latin alphabet remain the same; all other symbols are recoded within the ASCII range 37..126. In this current WWW document the IPA symbols cannot be shown, but the columns indicate respectively a SAMPA symbol, its ASCII/ANSI number, the shape of the corresponding IPA symbol, the Unicode number (hex, decimal) for the IPA symbol, and the symbol's meaning or use.

UKT note: I've included the IPA symbols (characters) in Arial Unicode MS size 12 from the given Unicode numbers. The table on the original page was in Courier New 10pt font. Though I've given the Decimal input (the original page gave the number without preceding zeros), the SAMPA character can be directly typed in from the keyboard.

SAMPA IPA Unicode Remarks
  Decimal Input
Alt+xxxx
Character
                  Description
Hexadecimal Input
Alt+U+xxxx
HTML code UKT note: Some IPA characters can be inputted from keyboard using Alt+xxxx , if xxxx < 0255 , e.g. 0230 > ; 0248 > ; 0240 > ;

Vowels

A 0065 ɑ script a U+0251 &#593; open back unrounded, Cardinal 5, Eng. start
{ 0123 ligature U+00E6 &#230; near-open front unrounded, Eng. trap
6 0054 ɐ turned a U+0250 &#592; open schwa, Ger. besser
Q 0081 ɒ turned script a U+0252 &#594; open back rounded, Eng. lot
E 0069 ɛ epsilon U+025B &#603; open-mid front unrounded, C3, Fr. mme
@ 0064 ə turned e U+0259 &#601; schwa, Eng. banana
3 0051 ɜ rev. epsilon U+025C &#604; long mid central, Eng. nurse
I 0073 ɪ small cap I U+026A &#618; ax close front unrounded, Eng. kit
O 0079 ɔ turned c U+0254 &#596; open-mid back rounded, Eng. thought
2 0050 U+00F8 &#248; close-mid front rounded, Fr. deux
9 0057 oe ligature U+0153 &#339; open-mid front rounded, Fr. neuf
& 0038 ɶ s.c. OE lig. U+0276 &#630; open front rounded
U 0085 ʊ upsilon U+028A &#650; lax close back rounded, Eng. foot
} 0125 ʉ barred u U+0289 &#649; close central rounded, Swedish sju
V 0086 ʌ turned v U+028C &#652; open-mid back unrounded, Eng. strut
Y 0089 ʏ small cap Y U+028F &#655; lax [y], Ger. hbsch

Consonants

B 0066 β beta U+03B2 &#946; voiced bilabial fricative, Sp. cabo
C 0067 c-cedilla U+00E7 &#231; voiceless palatal fricative, Ger. ich
D 0068 , eth U+00F0 &#240; voiced dental fricative, Eng. then
G 0071  ɣ gamma U+0263 &#611; voiced velar fricative, Sp. fuego
L 0076 ʎ turned y U+028E &#654; palatal lateral, It. famiglia
J 0074 ɲ left-tail n U+0272 &#626; palatal nasal, Sp. ao
N 0078 ŋ eng U+014B &#331; velar nasal, Eng. thing
R 0082 ʁ inv. s.c. R U+0281 &#641; vd. uvular fric. or trill, Fr. roi
S 0083 ʃ esh U+0283 &#643; voiceless palatoalveolar fricative, Eng. ship
T 0084 θ theta U+03B8 &#952; voiceless dental fricative, Eng. thin
H 0072 ɥ turned h U+0265 &#613; labial-palatal semivowel, Fr. huit
Z 0090 ʒ ezh (yogh) U+0292 &#658; vd. palatoalveolar fric., Eng. measure
? 0063 ʔ dotless ? U+0294 &#660; glottal stop, Ger. Verein, also Danish std
Length, stress and tone marks
: 0058 ː colon U+02D0 &#720; length mark
" 0034   colon U+02C8 &#712; primary stress
% 0037 ˌ low vert. str. U+02CC &#716; secondary stress
` 0096   (see note)     falling tone
' 0039   (see note)     rising tone

Author's Note: The SAMPA tone mark recommendations were based on the IPA as it was up to 1989-90. Since then, however, the IPA has changed its symbols for falling and rising tones. These SAMPA tone marks may now be considered obsolete, having in practice been superseded by the SAMPROSA proposals.

Diacritics
(shown with another symbol as an example)

=n 0060 ̩ inferior stroke U+0329 &#809; syllabic consonant, Eng. garden
O~ 0126 ̃ superior tilde U+0303 &#771; nasalization, Fr. bon

UKT note: There's seem to be an error in the original page.
Input Alt+0060 generates < . The input Alt+0061 generates = .

The phonemic notation of individual languages
Internet links: These pages provide a brief outline of the phonemic distinctions in various languages: Arabic, Bulgarian, Cantonese, Czech, Croatian, Danish, Dutch, English, Estonian, French, German, Greek, Hebrew, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, Thai, Turkish.

Extensions
Internet links: These pages provide extensions of the basic segmental SAMPA: SAMPROSA (prosodic), X-SAMPA (other symbols, mainly segmental).

Other Internet links:
UCL Phonetics and Linguistics home page, University College London home page.
A utility: Instant IPA in Word - converts SAMPA to IPA.

For queries please contact John Wells by e-mail or at
Department of Phonetics and Linguistics, University College London, Gower Street, London WC1E 6BT.
Phone:  +44 171 380 7175

Last revised 2003 April 28
http://www.phon.ucl.ac.uk/home/sampa/home.htm

End of TIL file