Bengali script


by U Kyaw Tun (UKT) M.S. (I.P.S.T., U.S.A.) Based on Unicode Consortium, Not for sale. No copyright. Free for everyone. Prepared for students and staff of TIL Research Station, Yangon, MYANMAR :  http://www.tuninst.net , www.romabama.blogspot.com
A supportive source is Allan Wood's Unicode Resources: http://www.alanwood.net/unicode/bengali.html 140612, and an online dictionary such as http://bengali.indianlanguages.org/dictionary/ 140612

UKT 140611: The speech (spoken language) is spelled in Bur-Myan: {Bn~ga.la:} with a {king:si:} and not with a {::tn}. I have  colour-coded the word to show how it is to be read: first syllable in red, second green, and third brown. The Ban-Ben script (written language) has split vowels similar to those in Bur-Myan, which calls for a change in font face from Arial Unicode MS to Lucida Sans Unicode. Caveat: However Lucida does not support IAST transliterations such as ṇ & ḍ , and "font variant small caps" which is used by Unicode, and so change font to: Lucida Sans Unicode only when necessary. 


Split vowels :

UKT 140612: Sometimes I get bored with what many would call boring, then  its time to watch children's videos. Here are two on Split vowels in English:
   <not> --> <note>
where <t> in <note> goes between <o> and <e>
Watch a children's videos on Magic-E & Silent-E:
https://www.youtube.com/watch?v=bZhl6YcrxZQ 140612
https://www.youtube.com/watch?v=mnanlcyRuuI 140612

Special characters
Horizontal conjuncts
  Khanda Ta
  Zero-width space : ZWSP
Rendering behavior


UKT notes


9.2 Bengali

(pdf 17/4)
The Bengali script is a North Indian script closely related to Devanagari. It is used to write the Bengali language primarily in the West Bengal state and in the nation of Bangladesh. It is also used to write Assamese in Assam and a number of other minority languages, such as Daphla, Garo, Hallam, Khasi, Manipuri, Mizo, Munda, Naga, Rian, and Santali, in northeastern India.

Virama ( Hasant ).

UKT 140612: Viram (Virama) is Bur-Myan {a.t}. Whilst Bur-Myan sign or "Flag" {tn-hkwun} is on  top of the character, , the Ban-Ben sign, similar to that of Skt-Dev, is shown below the character.

(pdf 17/4 cont)
The Bengali script uses the Unicode virama model to form conjunct consonants. In Bengali, the virama is known as hasant .

Split vowels : Two-Part Vowel Signs

UKT 140612: Refer to BEPS Vowels - BHS-indx.htm / BEPS Vowels
Notice how the mid back-vowel /o/ is creating trouble in transcribing Bur-Myan and Indic languages. It has led Edgerton to write: "in Pali the notorious substitution of e  {} for o  {AW} aṃ {n} " on page 4 footnote 11 of his Introduction -- i02original.htm

UKT 140612: Based on history and geography, we should expect Bur-Myan to be close to Ban-Ben (Bangla speech written in Bengali script). Since I no longer have faith in the descriptions and transcriptions given by Western philologists and phoneticians, the only recourse that I have is instrumental analysis. Back in 2008, I came across the "Six Vowels of Bangla Vowel System" in Acoustic Classification of Bangla Vowels by S.A. Houssain, M.L. Rahman & F. Ahmed, International J. of Appl. Math. and Computer Sc., 4(2), 2007,  which I am reproducing on the right. Some time later, on his trip to Bangladesh, my son Dr. Zin Tun, National Research Council of Canada, tried to contact the authors on my behalf. But he could not contact any of them.

See also my work on Human Voice - indx-HV.htm  (link chk 140612)
and included articles on
Wave nature of sound [former hv5.htm] - snd-wav.htm (link chk 140612)
How sound is produced and heard [former hv6.htm] - snd-hear.htm  (link chk 140612

(pdf 17/4 cont)
The Bengali script, along with a number of other Indic scripts, makes use of two-part vowel signs; in these vowels one-half of the vowel is placed on each side of a consonant letter or cluster for example [UKT ],

   Bur-Myan vow-sign:  

   Bur-Myan vow-sign:

UKT 140612: Though the above two show only a difference in the absence of a flag or its  presence, the two are not allophones: they are separate vowels which are depicted in Bur-Myan as vow-let {AW} & {au}. They look like allophones when coupled to consonants, e.g. {kau:} & {kau}.

(pdf 17/4 cont)
The vowel signs are coded in each case in the position in the charts isomorphic with the corresponding vowel in Devanagari. Hence [UKT ]  [the following are isomorphic]

   Bur-Myan vow-sign:

UKT 140612: Note the split-vowel in both Bur-Myan & Ban-Ben which is a problem for rendering engines. However do not think that split vowels are confined to Bur-Myan & Ban-Ben only. Eng-Lat have split vowels in what they call "magic-E" aka "silent-E": e.g.

(pdf 17/4 cont)
To provide compatibility with existing implementations of the scripts that use two-part vowel signs, the Unicode Standard explicitly encodes the right half of these vowel signs; for example, U+09D7 BENGALI AU LENGTH MARK  represents the right-half glyph component of U+09CC BENGALI VOWEL SIGN AU.

Special Characters 

(pdf 17/4 cont)
U+09F2..U+09F9 are a series of Bengali additions for writing currency and fractions.

Horizontal conjuncts

UKT 140612: Readers-writers of Bur-Myan are familiar with {paaHT-hsing.}. See MLC MED2006-272 for its definition. {paaHT-hsing.} is a vertical conjunct: one consonant (killed) above another, e.g. {k~ka.} - they are generally mute. Its equivalent in Skt-Dev is क्क kka. The first (upper) consonant has its vowel removed by the viram which is not shown in the conjunct.

However most of us are not aware of the horizontal conjunct such as {~a.}. It is known as Tha'gyi and many (including myself at one time) think it to be a basic consonant like its name-sake {a.}. {~a.} being a conjunct is mute. It is made up of two {a.} with a viram in the centre. The viram is hidden in the conjunct. Ban-Ben Khanda Ta, discussed below is a horizontal conjunct.


Khanda Ta.

(pdf 17/4 cont)
The Bengali syllable tta is notable. It is encoded with the following sequence:

U+09CD BENGALI SIGN VIRAMA  (= hasant)  ্

UKT 140612: My analysis of Bengali word Khanda Ta is as follows:
First from Online Bengali to English Dictionary: http://bengali.indianlanguages.org/dictionary/ 140612
Note to TIL editor: I have to use Lucida Sans Unicode to get the correct rendering in Ban-Ben. I then change the font face to Arial Unicode MS to get the IAST transliteration which gives me khaṇḍa. From it, I get {hkN~da.}. Bur-Myan does not have this word: what it has is {kN~da.} - MLC MED2006-018.
   Ban-Ben to English - n a part; a fragment; a portion, a region
   Bur-Myan to English - n. section (Pal: )
However, it could also be, {hkn~Da} 'aggregate'. - MLC MED2006-064

খণ্ড --> খণ্ড - khanda - khaṇḍa

I have analyzed the Khanda Ta as follows. It is a horizontal conjunct analogous to Bur-Myan {T~HTa.}. Bur-Myan uses the r3 retroflex-aksharas whereas the Ban-Ben (Bangla-Bengali) uses r4 dental-aksharas. In both cases, the respective viram {a.t} is not shown. A minor difference is in the second akshara: Bur-Myan uses r3c2, whereas Ban-Ben uses r4c1.

Bur-Myan uses retroflex :
   {Ta.} + viram-sign + {HTa.} --> {T~HTa.}

Ban-Ben uses dental : 
   ত ta + ্ viram-sign + ত ta -->  ত্ত tta

I contend that Ban-Ben had belonged to the Tib-Myan linguistic group, the same group as Bur-Myan. However, it has fallen under the the influence of IE (Indo-European) Skt-Dev primarily due to the suppression Buddhism by the Brahmin-Poannas {brah~ma.Na. poaN~Na:}.

The political change was brought about by the assassination of the Buddhist grandson of Asoka by his own Hindu (in Bur-Myan {brah~ma.Na.}) general, followed by the suppression of Buddhism. However it lasted only a relatively short period, when the Gupta (Hindu) dynasty came into power. The Hindu Guptas promoted many faiths including Buddhism, science, astronomy & mathematics, and literature in the Nalanda University which they patronized.

The complete route of Buddhism came about with the conquest of the Muslims, when Nalalnda University was completely destroyed, and thousands of Buddhist monks were killed, and manuscripts were burnt. It is said that because of many manuscripts, the burning lasted a whole month or so.

And so now Ban-Ben is being thought to be an IE (Indo-European). However, because, it is used by many ethnic groups, the languages of some might still be considered belonging to Tib-Bur.

UKT note 140615: The above is a short historical account of why Theravada Buddhism had almost completely disappeared in Bengal (a part of which has come to be Bangladesh - a separate nation with Islam as the state religion). As a scientist, I have tried to be as objective as possible. However, many points could be disputed. How to determine whether Ban-Ben is Tib-Bur or IE can be done by a study of its vowels in terms of formants, F1 & F2, and comparing it to Bur-Myan. Written historical accounts including inscriptional are never reliable as history is being rewritten again and again. You can read about this period in many Wikipedia articles which had been my sources among others. I am just trying to look into the influence it had had on the neighbouring Myanmarpr during the Pyu kingdom of Sri Ksetra and its successor the Pagan kingdom.

(pdf 17/4 cont)
The sequence will normally be displayed using the single glyph tta  ligature t. It is also possible for the sequence to be displayed using a khanda  ta  glyph followed by a full ta  glyph , or with a full ta glyph combined with a virama glyph and followed by a full ta glyph . The choice of form actually displayed depends on the display engine, based on availability of glyphs in the font.

Zero-width space : ZWSP

UKT 140613: You can encode the viram by using what are known as "zero-width space". .  However, the character maps of both Arial Unicode MS and Lucida Sans Unicode do not provide either the "joiner" or "non-joiner", and I cannot show how to use them on this computer using my copy-paste from the character map. However, if you switch the Microsoft FrontPage (which I am using) from the "Design" view to "Code view" and use decimal numbers, you can still encode as I have done below. 

Of course you would have know the decimal number equivalent of hexadecimal (hex) number used by Unicode. Use the calculator of your computer, Type in your hex number, and switch on to decimal number display.

(pdf 17/4 cont)
The Unicode Standard provides an explicit way to encode a half-letter form. To do this, a ZERO WIDTH JOINER is inserted after the virama:

U+09A4   BENGALI LETTER TA  - (decimal: &#2468)
U+09CD  BENGALI SIGN VIRAMA  (= hasant) - (decimal: &#2509)
U+200B   ZERO WIDTH SPACE  - (decimal: &#8203)
U+09A4   BENGALI LETTER TA - (decimal: &#2468)

UKT 140613: Using decimal numbers instead of hexadecimal numbers used by Unicode, you can still encode:
   &#2468; + &#2509; + &#8203 + &#2468; --> ত্​ত
Remember to remove the + signs in the code view.

This sequence is always displayed as a khanda ta glyph followed by a full ta glyph. Even if the consonant ta is not present, the sequence U+09A4, U+09CD, U+200D is displayed as a khanda ta glyph .

The Unicode Standard provides an explicit way to show the virama glyph. To do this, a  ZERO WIDTH NON-JOINER  is inserted after the virama: (p232 pdf 17/4 end) (pdf 17/4 begin


This sequence is always displayed as a full ta glyph combined with a virama glyph and followed by a full ta glyph .

A summary image of various sequences is shown in Figure 9-10.


(pdf 17/4 cont)
Ya-phalaa (pronounced jo-phola in Bengali) is a presentation form of U+09AF য BENGALI LETTER YA . [UKT ]

UKT 140613: The equivalent of Ban-Ben য ya in Skt-Dev is य ya, and in Bur-Myan {ya.}. Its IPA transcription is /j/ and is classified as a palatal approximant. However, I have shifted it to the velar position to make way for {a.} which is a basic akshara in Bur-Myan and is classified a palatal. Skt-Dev does not have {a.} as a basic akshara, and treats it as a horizontal conjunct of two {a.}. However, a study of killed {} & killed {} in Bur-Myan shows that it is a true basic akshara (even though it is present only in Bur-Myan among the BEPS languages.

(pdf 17/4 cont)
Represented by the sequence <U+09CD ्  BENGALI VIRAMA + U+09AF য BENGALI YA >, ya-phalaa has a special form . [UKT ]

UKT 140613: I am wondering if is killed {}. However, the sequence is not the same.

When combined with U+09BE  ा BENGALI VOWEL SIGN AA , it is used for transcribing [the English short a ] [] as in the a in the English word <bat>. [UKT ]

UKT note: Because, I cannot show Ban-Ben ā (long vowel), I have shown Skt-Dev equivalent for (U+09BE)

Ya-phalaa can be applied to initial vowels as well:

0985 + 09CD + 09AF + 09BE
  অ  +    ্    +   য    +   া   -->  অ্যা (a-hasant ya-aa)

098F + 09CD + 09AF + 09BE
  এ   +    ্   +   য    +   া   -->  এ্যা  (e-hasant ya-aa)

If a candrabindu or other combining mark needs to be added in the sequence, it comes at the end of the sequence. For example:

UKT 140613: I tried to write the above on my own. However, the chandra bindu moves forward:

  অ  +    ্    +   য    +   া  +  ঁ -->  অ্যাঁ

Further examples:


Rendering Behavior 

(pdf 17/4 cont)
Like other Asokan Brahmic scripts in the Unicode Standard, Bengali uses the virama to form conjunct characters. For example, 

   --> the conjunct KSSA pronounced "khya" in Assamese

ক + ্ + ষ --> ক্ষ

UKT 140613: Assam, lying on the hills between India and Myanmarpr shares many cultural and linguistic roots with Myanmarpr. Of course the pronunciation "khya" is not surprising. We find the same conjunct in Skt-Dev, क + ् + ष --> क्ष  kṣa. This is what I have been calling "Pseudo-Kha" because many words spelled in Pal-Myan with {hka.} are spelled with क्ष  kṣa in Skt-Dev. I cannot say that for Ban-Ben - unless I know the language.

For general principles regarding the rendering of the Bengali script, see the rules for rendering in Section 9.1, Devanagari.

Danda {poad hprt} and double danda {poad-ma.} marks as well as some other unified punctuation used with Bengali are found in the Devanagari block; see Section 9.1, Devanagari.

UKT notes


End of TIL file