Update: 2007-02-10 11:20 PM -0500



Burmese Written Language in Roman Script


U Kyaw Tun, M.S. (I.P.S.T., U.S.A.), Deep River, Ontario, Canada. Not for sale. No copyright. Free for everyone. Prepared for students of TIL Computing and Language Center, Yangon, MYANMAR .

General considerations: Romabama rime
Vowels ending in killed aksharas
Vowels (formed from vowel-letters) ending in killed aksharas
Extending the vowels ending in killed nasals
Extending the vowels ending in killed wag-aksharas
Extending the vowels ending in killed awag-aksharas


UKT note
Inherent vowel
South Asian scripts

General considerations: Romabama rime

Romabama, the almost one-to-one transliteration, is essentially Burmese speech in Latin script. The effective unit of Burmese-Myanmar script, like Devanagari, its sister script, is the orthographic syllable, consisting of a consonant and vowel (CV) core and, optionally, one or more preceding consonants. The canonical structure of Burmese-Myanmar syllable is (((C)C)C)V without a coda. However, Romabama uses the structure CV, where is a killed-consonant derived from the akshara. For syllables with the killed-consonant, the rhyme or rime is more important than the vowel, and this chapter on the Burmese-Myanmar vowel is actually about the rhyme.





Vowels ending in killed aksharas

Leaving aside the row 3 ({Ta.} row) for the present, the c1s of r1, r2, r4, r5 would give:

. A possible Romabama transliteration is: {ak}, {is}, {ut}, {up}. You will notice the peak vowel changing from a > i > u .

Then, for each row, we will have:

row 1 {ak} {ahk} {ag}
row 2 {is} {ihs} {iz}
row 4 {ut} {uht} {ud}
row 5 {up} {uhp} {ub}

In the above list, row 1 {ak} is the first of the rimes: {ak}, {k}, {aik}, {oak}, {auk} {eik}.

Vowels (from vowel-letters) ending in killed aksharas

In the following formed from {ut} {t} {ait} {oat} {t} {aut}, {eit}, vowel-letters {I.}, {U.}, {AU.} can also take part. My tentative rule of spelling is to write down words formed with vowel-signs (using {a.}), and then capitalize the letters corresponding to vowel-letters. I am finding it very unsatisfactory, and will have to come back to it later.

{ut~ta.} /|a' ta.|/ - personal pronoun  I. n. self. (Pali: {ut~ta.}) -- MEDict626

{ait} /|ei'|/ - n. 1. bag; sack. -- MEDict626
{AIt~hti.ya.} /|ei' hti. ja.|/ - n. woman. (Pali: {AIt~hti.ya.})

{oat} /|ou'|/ - n. brick (Pali: AIT~hTa.ka.}) -- MEDict626
{OAt~ta.ra. hpa.la.gu.ni} /|ou' tara. hpala. gu. ni|/
  - n. astronomy asterism of two stars in Leo resembling the rear legs of a couch. -- MEDict627


Extending the vowels ending in killed nasals

The vowels forming in killed nasals are realised in three tones as in the case of simple vowels, e.g. {a.} {a} {a:}. In this respect they are different form the vowels ending in killed wag-aksharas. This has led some Western scholars to say that Burmese-Myanmar has 4 tones:
1. short, 2. medium, 3. long, and 4. checked.

row 1: {ing} {ng} {aing} {oang} {ng} {aung} {eing}
row 2a: {i} { } {ai} {oa} {} {au} {ei}
row 2b: {} {} {ai} {oa} {} {au} {ei}
row 3: {aN} {N} {aiN} {oaN} {N} {auN} {eiN}
row 4: {an} {n} {ain} {oan} {n} {aun} {ein}
row 5: {am} {m} {aim} {oam} {m} {aum} {eim}

Extending the vowels ending in killed wag-aksharas

There are more than one vowel-combinations for each killed consonant. Thus, for killed {ka.}, there are 6 possible combinations. The peak vowels (in Romabama syllables) are chosen to reflect the pronunciation (for which I am relying on the transliterations given in MEDict by MLC.

row 1: {ak} {k} {aik} {oak} {k} {auk}, {eik}
row 2:  {is} {s} {ais} {oas} {s} {aus} {eis}
row 3:   {uT} {T} {aiT} {oaT} {T} {auT}, {eiT}
row 4: {ut} {t} {ait} {oat} {t} {aut}, {eit}
row 5:  {up} {p} {aip} {oap} {p} {aup}, {eip}

UKT: Please note that the Romabama peak-vowels are tentative, and I will come back to them later, after working with my peers.

Extending the vowels ending in killed awag-aksharas

The a-wag aksharas, some of them being semi-vowels themselves affect the peak vowels differently when compared to wag aksharas. At one time, almost all the a-wag aksharas in the form of killed aksharas (i.e. with {athut}) were used. However, at present, {ya.thut} is the only one that is commonly used. Use of {ra.thut}, {la.thut} and {wa.thut} is very rare.

The Romabama spelling for the following is still under consideration (the killed-aksharas are mute - how are they to be represented?):

{ya.thut} : {} {} {i} {u} {} {au} {o}
{ra.thut} : {ar} {r} {ir} {ur} {r} {aur} {or}
{la.thut} : {al} {l} {il} {ul} {l} {aul} {ol}
{wa.thut} : {aw} {w} {iw} {uw} {w} {auw} {ow}
{tha.thut} : {ath} {th} {ith} {uth} {th} {auth} {oth}
{ha.thut} : {ah} {h} {ih} {uh} {h} {auh} {oh}

UKT: The characters in the above rimes are digraphs. Whether they are diphthongs or monophthongs is still up to my peers to decide. Above all, I would always emphasize that the Romabama vowels are all tentative. After all, my initial purpose is to design Romabama to write emails using only ASCII characters.

UKT note


The following is a direct quotation from DJPD16 p105 - Info panel 17

coda: The end of a syllable, which is said to be made up of an ONSET, a peak and a coda. The peak and the coda constitute the RHYME (or RIME) of the syllable.
   Examples for English
English allows up to four consonants to occur in the coda, so the total number of possible codas in English is very large -- several hundred in fact, e.g.:
  <sick>  /sɪk/
  <six>  /sɪks/
  <sixth>  /sɪksθ/
  <sixths>  /sɪksθs/
The central part of a syllable is almost always a vowel, and if the syllable contains nothing after the vowel it is said to have no coda ('zero coda'), e.g.
  <bough>  /baʊ/
  <buy>  /baɪ/
In other languages
Some languages (e.g. Japanese) have no codas in any syllables.

UKT: It is always important to remember that Devanagari is an abugida, where the consonantal character has an inherent vowel and is pronounceable, whereas Latin is an alphabet whose consonantal character is mute. Thus, {ka.} is pronounceable, whereas <k> is mute.

The following is an edited text from Chapter 9, The Unicode Standard, version 4.0, Unicode Consortium, http://www.unicode.org/versions/Unicode4.0.0/ch09.pdf . Approx. p219

The Devanagari script is used for writing classical Sanskrit and its modern historical derivative, Hindi. Extensions to the Sanskrit repertoire are used to write other related languages of India (such as Marathi) and of Nepal (Nepali). In addition, the Devanagari script is used to write the following languages: Awadhi, Bagheli, Bhatneri, Bhili, Bihari, Braj Bhasha, Chhattisgarhi, Garhwali, Gondi (Betul, Chhindwara, and Mandla dialects), Harauti, Ho, Jaipuri, Kachchhi, Kanauji, Konkani, Kului, Kumaoni, Kurku, Kurukh, Marwari, Mundari, Newari, Palpa, and Santali.
   All other Indic scripts, as well as the Sinhala script of Sri Lanka, the Tibetan script, and the Southeast Asian scripts, are historically connected with the Devanagari script as descendants of the ancient Brahmi script. The entire family of scripts shares a large number of structural features.
   The principles of the Indic scripts are covered in some detail in this introduction to the Devanagari script. The remaining introductions to the Indic scripts are abbreviated but highlight any differences from Devanagari where appropriate.

Standards. The Devanagari block of the Unicode Standard is based on ISCII-1988 (Indian Script Code for Information Interchange). The ISCII standard of 1988 differs from and is an update of earlier ISCII standards issued in 1983 and 1986.

The Unicode Standard encodes Devanagari characters in the same relative positions as those coded in positions A0-F416 in the ISCII-1988 standard. The same character code layout is followed for eight other Indic scripts in the Unicode Standard: Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam. This parallel code layout emphasizes the structural similarities of the Brahmi scripts and follows the stated intention of the Indian coding standards to enable one-to-one mappings between analogous coding positions in different scripts in the family. Sinhala, Tibetan, Thai, Lao, Khmer, Myanmar, and other scripts depart to a greater extent from the Devanagari structural pattern, so the Unicode Standard does not attempt to provide any direct mappings for these scripts to the Devanagari order.

In November 1991, at the time The Unicode Standard, Version 1.0, was published, the Bureau of Indian Standards published a new version of ISCII in Indian Standard (IS)13194:1991. This new version partially modified the layout and repertoire of the ISCII-1988 standard. Because of these events, the Unicode Standard does not precisely follow the layout of the current version of ISCII. Nevertheless, the Unicode Standard remains a superset of the ISCII-1991 repertoire except for a number of new Vedic extension characters defined in IS 13194:1991 Annex G-Extended Character Set for Vedic. Modern, non-Vedic texts encoded with ISCII-1991 may be automatically converted to Unicode code points and back to their original encoding without loss of information.

Encoding Principles

The writing systems that employ Devanagari and other Indic scripts constitute abugidas -- a cross between syllabic writing systems and alphabetic writing systems. The effective unit of these writing systems is the orthographic syllable, consisting of a consonant and vowel (CV) core and, optionally, one or more preceding consonants, with a canonical structure of (((C)C)C)V. The orthographic syllable need not correspond exactly with a phonological syllable, especially when a consonant cluster is involved, but the writing system is built on phonological principles and tends to correspond quite closely to pronunciation.

The orthographic syllable is built up of alphabetic pieces, the actual letters of the Devanagari script. These pieces consist of three distinct character types: consonant letters, independent vowels [UKT: vowel letters], and dependent vowel signs [UKT: vowel signs]. In a text sequence, these characters are stored in logical (phonetic) order.

inherent vowel

by UKT

What is the inherent vowel? That has been my question ever since I came to study the akshara system of writing. To say that it is approximately the English "short-a", does not mean much for the English <a> itself has a changing nature and can mean anything to even to a "native-English" speaker. Moreover, when you say a "native-speaker", it becomes more confusing because the US-American, Australian, British, Canadian, and New Zealander speak in their own sweet ways. And unless you are familiar with the English-Latin vowels, to say that the inherent vowel is close to  /a/ is meaningless.

The inherent vowel sometimes appears as a schwa in words such as {a.ni} meaning <red>. See the following on non-rhotic English dialects (spoken by South Indians, formerly dubbed "the Hindus" and Myanmars) below.

Non-rhotic dialects of English began to emerge in about the year 1600. The loss of the sound [r] is known as de-rhotacization. Evidence of the earliest date of the sound change is shown in the English word juggernaut, which is first attested in the 1630s. This represents the Hindi word jagannth, meaning "lord of the universe"; the English spelling shows that the digraph er  was chosen to represent a Hindi sound that is close to the English schwa.

A non-rhotic speaker pronounces the [r] in red, torrid, watery (in each case the [r] is followed by a vowel) but not the written [r] of hard, nor that of car or water except when the word is followed by a vowel. In most non-rhotic accents, if a word ending in written [r] is followed closely by another word beginning with a vowel the [r] is, however, sounded as in water ice. This phenomenon is referred to as "linking [r]". Many non-rhotic speakers also insert epenthetic [r]s between vowels (droring for drawing). This so-called "intrusive [r]" is frowned upon by those who use the non-rhotic Received Pronunciation (RP) but even they frequently "intrude" an epenthetic [r] at word boundaries, pronouncing, for example, Africa and Asia as Africa-r-and Asia.

For non-rhotic speakers, what was historically a vowel plus [r] is now usually realized as a long vowel. So car, hard, fur, born are phonetically /kaː/, /haːd/, /fəː/, /bɔːn/ (see International Phonetic Alphabet for a key to phonetic symbols). This length is retained in phrases, so car owner is /kaːɹ oʊnə/. But a final schwa remains short, so water is /wɔːtə/. For some speakers some long vowels alternate with a diphthong ending in schwa, so wear is /wɛə/ but wearing is /wɛːɹiŋ/. Some pairs of words are homophonic for non-rhotic speakers but not for rhotic speakers; for example, spa and spar are pronounced identically by many non-rhotic speakers, but differently by rhotic speakers.


Areas with non-rhotic accents include Africa, Australia, most of the Caribbean, most of England (especially Received Pronunciation speakers), New Zealand, South Africa, the southeastern United States (although pockets of rhotic speakers do exist in the southern United States, especially in northwest Alabama, central Tennessee and peninsular Florida in general the non-rhotic accent is more common in coastal Southern styles, while the Appalachian accent is rhotic), the northeastern United States (New England and New York State), and Wales.

-- http://dictionary.laborlawtalk.com/Rhotic

Loss of the inherent vowel makes the consonant akshara loss its sound. Thus a "killed" {ka.} is soundless.

Used as a title for the Hindu deity Krishna. [Hindi jagannāth title of Krishna from Sanskrit jaganāthalord of the world jagat moving, the world ( from jigāti he goes).; See g w - in Indo-European Roots. nātha/ lord Senses 1 and 2, from the fact that worshipers have thrown themselves under the wheels of a huge car or wagon on which the idol of Krishna was drawn in an annual procession at Puri in east-central India]  -- AHTD

The following is a direct quotation from DJPD16 p458 - Info panel 62

p.458. In the phonological analysis of the syllable, this is a way of referring to the vowel in the middle of the syllable forming its 'peak' plus any sounds following the peak within the syllable (the CODA).

Examples for English

In the word <spoon> the rhyme (or rime) is /uːn/, in <tea> it is /iː/ and in <strengths> it is /eŋθs/ or /eŋkθs/.


The spelling <rhyme> also refers to a pair of lines that end with the same sequence of sounds in verse. If we examine the sound sequences that must match each other, we find that these consist of the vowel and any final consonants of the last syllable: thus <moon> and <June> rhyme, and the initial consonants of these two words are not important (of course, we do find longer-running rhymes than this in verse, e.g. <ability> rhyming with <senility>).

South Asian scripts

UKT: The following is an edited text from Chapter 9, The Unicode Standard, version 4.0, Unicode Consortium, http://www.unicode.org/versions/Unicode4.0.0/ch09.pdf . Approx. p217

The scripts of South Asia share so many common features that a side-by-side comparison of a few will often reveal structural similarities even in the modern letterforms. With minor historical exceptions, they are written from left to right. They are all abugidas (also called an alphasyllabary) in which most symbols stand for a consonant plus an inherent vowel (usually the sound /a/). Word-initial vowels in many of these scripts have distinct symbols, and word-internal vowels are usually written by juxtaposing a vowel sign in the vicinity of the affected consonant. Absence of the inherent vowel, when that occurs, is frequently marked with a special sign. In the Unicode Standard, this sign is denoted by the Sanskrit word virāma. (Burmese-Myanmar: {a-thut} ). In some languages another designation is preferred. In Hindi, for example, the word hal refers to the character itself, and halant refers to the consonant that has its inherent vowel suppressed; in Tamil, the word puḷḷi is used. The virama sign ( {tn-hkwun} -- meaning: flag.) nominally serves to suppress the inherent vowel of the consonant to which it is applied; it is a combining character, with its shape varying from script to script.

UKT: Loss of the inherent vowel makes the consonant akshara loss its sound. Thus a "killed" {ka.} is soundless.

Most of the scripts of South Asia, from north of the Himalayas to Sri Lanka in the south, from Pakistan in the west to the easternmost islands of Indonesia, are derived from the ancient Brahmi script. The oldest lengthy inscriptions of India, the edicts of Ashoka from the third century, were written in two scripts, Kharoshthi and Brahmi. These are both ultimately of Semitic origin, probably deriving from Aramaic, which was an important administrative language of the Middle East at that time. Kharoshthi, written from right to left, was supplanted by Brahmi and its derivatives. The descendants of Brahmi spread with myriad changes throughout the subcontinent and outlying islands. There are said to be some 200 different scripts deriving from it. By the eleventh century, the modern script known as Devanagari was in ascendancy in India proper as the major script of Sanskrit literature. This northern branch includes such modern scripts as Bengali, Gurmukhi, and Tibetan; the southern branch includes scripts such as Malayalam and Tamil.

The major official scripts of India proper, including Devanagari, are all encoded according to a common plan, so that comparable characters are in the same order and relative location. This structural arrangement, which facilitates transliteration to some degree, is based on the Indian national standard (ISCII) encoding for these scripts, and makes use of a virama. Sinhala has a virama-based model, but is not structurally mapped to ISCII. Tibetan stands apart, using a subjoined consonant model for conjoined consonants, reflecting its somewhat different structure and usage. The Limbu script makes use of an explicit encoding of syllable-final consonants.

Many of the character names in this group of scripts represent the same sounds, and naming conventions are similar across the range.


UKT: The following is a direct quotation from DJPD16 p522 - Info panel 72

p522. A fundamentally important unit -- the most basic unit in speech. Here we are concerned with the phonological notion of the syllable.

Examples for English

Phonologists are interested in the structure of the syllable, since there appear to be interesting observations to be made about which phonemes may occur at the beginning, in the middle and at the end of syllables. In English, it is possible to have from zero to up to three consonants in the ONSET of a syllable, and from zero to up to four in the CODA.

The study of sequences of phonemes is called 'phonotactics', and it seems that the phonotactic possibilities of a language are determined by syllabic structure. This means that any sequence of sounds that a native speaker produces can be broken down into syllables without any segments being left over. For example, in <Their strengths triumphed frequently>, we find the rather daunting sequences of consonant phonemes /ŋθstr/ and /mftfr/ , but using what we know of English phonotactics we can split these clusters into one part that belongs to the end of one syllable and another part that belongs to the beginning of another. Thus the first one can only be divided /ŋθ | str/ or /ŋθs | tr/ and the second can only be /mft | fr/ .

Phonological treatments of syllable structure usually call the first part of a syllable the ONSET, the middle part the 'peak' and the end part the CODA. The combination of peak and coda is called the RHYME. Syllable breaks, however, may be problematic, when approximants occur at syllable boundaries.


End of TIL file