bing.htm
by U Kyaw Tun (UKT) M.S. (I.P.S.T., U.S.A.) Based on Unicode Consortium,
Not for sale. No copyright. Free for
everyone. Prepared for students and
staff of TIL Research Station,
Yangon, MYANMAR :
http://www.tuninst.net ,
www.romabama.blogspot.com
A supportive source is Allan Wood's
Unicode Resources:
http://www.alanwood.net/unicode/bengali.html 140612,
and an online dictionary such as
http://bengali.indianlanguages.org/dictionary/ 140612
UKT 140611: The speech (spoken language) is spelled in Bur-Myan:
{Bïn~ga.la:} with a
{king:si:} and not with a
{þé:þé:tïn}. I have colour-coded the word to show how it is to be read: first syllable in red, second green, and third brown. The Ban-Ben script (written language) has split vowels similar to those in Bur-Myan, which calls for a change in font face from Arial Unicode MS to Lucida Sans Unicode. Caveat: However Lucida does not support IAST transliterations such as ṇ & ḍ , and "font variant small caps" which is used by Unicode, and so change font to: Lucida Sans Unicode only when necessary.
UKT 140612: Sometimes I get bored with what many would call boring, then its time to watch children's videos. Here are two on Split vowels in English:
<not> --> <note>
where <t> in <note> goes between <o> and <e>
Watch a children's videos on Magic-E & Silent-E:
https://www.youtube.com/watch?v=bZhl6YcrxZQ 140612
https://www.youtube.com/watch?v=mnanlcyRuuI 140612
Special characters
Horizontal conjuncts
Khanda Ta
Zero-width space : ZWSP
Ya-phalaa
Rendering behavior
Punctuation
(pdf 17/4)
The Bengali script is a North Indian
script closely related to Devanagari.
It is used to write the Bengali language
primarily in the West Bengal state and
in the nation of Bangladesh. It is also
used to write Assamese in Assam and a
number of other minority languages, such
as Daphla, Garo, Hallam, Khasi, Manipuri,
Mizo, Munda, Naga, Rian, and Santali,
in northeastern India.
UKT 140612: Viram (Virama) is Bur-Myan
{a.þût}. Whilst Bur-Myan sign or "Flag"
{tän-hkwun} is on top of the character,
, the Ban-Ben sign, similar to that of Skt-Dev, is shown below the character.
(pdf 17/4 cont)
The Bengali script uses the Unicode virama
model to form conjunct consonants. In
Bengali, the virama is known as hasant .
UKT 140612: Refer to BEPS Vowels - BHS-indx.htm / BEPS Vowels
Notice how the mid back-vowel /o/ is creating trouble in transcribing Bur-Myan and Indic languages. It has led Edgerton to write: "in Pali the notorious substitution of e {É} for o {AW} aṃ {än} " on page 4 footnote 11 of his Introduction -- i02original.htm
UKT 140612: Based on history and geography, we should expect Bur-Myan to be close to Ban-Ben (Bangla speech written in Bengali script). Since I no longer have faith in the descriptions and transcriptions given by Western philologists and phoneticians, the only recourse that I have is instrumental analysis. Back in 2008, I came across the "Six Vowels of Bangla Vowel System" in Acoustic Classification of Bangla Vowels by S.A. Houssain, M.L. Rahman & F. Ahmed, International J. of Appl. Math. and Computer Sc., 4(2), 2007, which I am reproducing on the right. Some time later, on his trip to Bangladesh, my son Dr. Zin Tun, National Research Council of Canada, tried to contact the authors on my behalf. But he could not contact any of them.
See also my work on Human Voice - indx-HV.htm (link chk 140612)
and included articles on
¤ Wave nature of sound [former hv5.htm] - snd-wav.htm (link chk 140612)
¤ How sound is produced and heard [former hv6.htm] - snd-hear.htm (link chk 140612
(pdf 17/4 cont)
The Bengali script, along with a number
of other Indic scripts, makes use of
two-part vowel signs; in these vowels
one-half of the vowel is placed on each
side of a consonant letter or cluster
— for example [UKT ¶],
U+09CB BENGALI VOWEL SIGN O ো
Bur-Myan vow-sign:![]()
U+09CC BENGALI VOWEL SIGN AU ৌ
Bur-Myan vow-sign:![]()
UKT 140612: Though the above two show only a difference in the absence of a flag or its presence, the two are not allophones: they are separate vowels which are depicted in Bur-Myan as vow-let {AW} & {au}. They look like allophones when coupled to consonants, e.g.
{kau:} &
{kau}.
(pdf 17/4 cont)
The vowel signs are coded in each case
in the position in the charts isomorphic
with the corresponding vowel in Devanagari.
Hence [UKT ¶] [the following are
isomorphic]
U+09CC BENGALI VOWEL SIGN AU - ৌ
U+094C DEVANAGARI VOWEL SIGN AU - ौ
Bur-Myan vow-sign:![]()
UKT 140612: Note the split-vowel in both Bur-Myan & Ban-Ben which is a problem for rendering engines. However do not think that split vowels are confined to Bur-Myan & Ban-Ben only. Eng-Lat have split vowels in what they call "magic-E" aka "silent-E": e.g.
<not> --> <note>
where <t> in <note> goes between <o> and <e>
Watch a children's videos on Magic-E & Silent-E:
https://www.youtube.com/watch?v=bZhl6YcrxZQ 140612
https://www.youtube.com/watch?v=mnanlcyRuuI 140612
(pdf 17/4 cont)
To provide compatibility with existing
implementations of the scripts that use
two-part vowel signs, the Unicode Standard
explicitly encodes the right half of
these vowel signs; for example,
U+09D7 BENGALI AU LENGTH MARK represents
the right-half glyph component of
U+09CC BENGALI VOWEL SIGN AU.
(pdf 17/4 cont)
U+09F2..U+09F9 are a series of Bengali
additions for writing currency and fractions.
UKT 140612: Readers-writers of Bur-Myan are familiar with
{paaHT-hsing.}. See MLC MED2006-272 for its definition.
{paaHT-hsing.} is a vertical conjunct: one consonant (killed) above another, e.g.
{k~ka.} - they are generally mute. Its equivalent in Skt-Dev is क्क «kka». The first (upper) consonant has its vowel removed by the viram which is not shown in the conjunct.
However most of us are not aware of the horizontal conjunct such as
{þ~þa.}. It is known as Tha'gyi and many (including myself at one time) think it to be a basic consonant like its name-sake
{þa.}.
{þ~þa.} being a conjunct is mute. It is made up of two
{þa.} with a viram in the centre. The viram is hidden in the conjunct. Ban-Ben Khanda Ta, discussed below is a horizontal conjunct.
(pdf 17/4 cont)
The Bengali syllable “tta” is notable.
It is encoded with the following
sequence:
U+09A4 BENGALI LETTER TA ত
U+09CD BENGALI SIGN VIRAMA (= hasant) ্
U+09A4 BENGALI LETTER TA তUKT 140612: My analysis of Bengali word Khanda Ta is as follows:
First from Online Bengali to English Dictionary: http://bengali.indianlanguages.org/dictionary/ 140612
Note to TIL editor: I have to use Lucida Sans Unicode to get the correct rendering in Ban-Ben. I then change the font face to Arial Unicode MS to get the IAST transliteration which gives me «khaṇḍa». From it, I get{hkûN~d³a.}. Bur-Myan does not have this word: what it has is
{kûN~d³a.} - MLC MED2006-018.
Ban-Ben to English - n a part; a fragment; a portion, a region
Bur-Myan to English - n. section (Pal:)
However, it could also be,{hkûn~Da} 'aggregate'. - MLC MED2006-064
খণ্ড --> খণ্ড - khanda - «khaṇḍa»
I have analyzed the Khanda Ta as
follows. It is a horizontal conjunct
analogous to Bur-Myan
{T~HTa.}. Bur-Myan uses the r3
retroflex-aksharas whereas the Ban-Ben
(Bangla-Bengali) uses r4 dental-aksharas.
In both cases, the respective viram
{a.þût} is not shown. A minor difference
is in the second akshara: Bur-Myan
uses r3c2, whereas Ban-Ben uses r4c1.
Bur-Myan uses retroflex :
{Ta.} +
viram-sign +
{HTa.} -->
{T~HTa.}
Ban-Ben uses dental :
ত « ta » + ্ viram-sign + ত « ta » --> ত্ত «tta»
I contend that Ban-Ben had belonged to
the Tib-Myan linguistic group, the same
group as Bur-Myan. However, it has
fallen under the the influence of IE
(Indo-European) Skt-Dev primarily due
to the suppression Buddhism by the
Brahmin-Poannas
{brah~ma.Na. poaN~Na:}.
The political change was brought about
by the assassination of the Buddhist
grandson of Asoka by his own Hindu (in Bur-Myan
{brah~ma.Na.}) general, followed by the
suppression of Buddhism. However it lasted
only a relatively short period, when the
Gupta (Hindu) dynasty came into power. The
Hindu Guptas promoted many faiths including
Buddhism, science, astronomy & mathematics,
and literature in the Nalanda University
which they patronized.
The complete route of Buddhism came about with the conquest of the Muslims, when Nalalnda University was completely destroyed, and thousands of Buddhist monks were killed, and manuscripts were burnt. It is said that because of many manuscripts, the burning lasted a whole month or so.
And so now Ban-Ben is being thought to be an IE (Indo-European). However, because, it is used by many ethnic groups, the languages of some might still be considered belonging to Tib-Bur.
UKT note 140615: The above is a short historical account of why Theravada Buddhism had almost completely disappeared in Bengal (a part of which has come to be Bangladesh - a separate nation with Islam as the state religion). As a scientist, I have tried to be as objective as possible. However, many points could be disputed. How to determine whether Ban-Ben is Tib-Bur or IE can be done by a study of its vowels in terms of formants, F1 & F2, and comparing it to Bur-Myan. Written historical accounts including inscriptional are never reliable as history is being rewritten again and again. You can read about this period in many Wikipedia articles which had been my sources among others. I am just trying to look into the influence it had had on the neighbouring Myanmarpré during the Pyu kingdom of Sri Ksetra and its successor the Pagan kingdom.
(pdf 17/4 cont)
The sequence will normally be displayed
using the single glyph «tta» ligature t.
It is also possible for the sequence to be
displayed using a khanda «ta»
glyph followed by a full «ta» glyph
,
or with a full «ta» glyph combined with a
virama glyph and followed by a full ta glyph
.
The choice of form actually displayed
depends on the display engine, based
on availability of glyphs in the font.
UKT 140613: You can encode the viram by using what are known as "zero-width space". . However, the character maps of both Arial Unicode MS and Lucida Sans Unicode do not provide either the "joiner" or "non-joiner", and I cannot show how to use them on this computer using my copy-paste from the character map. However, if you switch the Microsoft FrontPage (which I am using) from the "Design" view to "Code view" and use decimal numbers, you can still encode as I have done below.
Of course you would have know the decimal number equivalent of hexadecimal (hex) number used by Unicode. Use the calculator of your computer, Type in your hex number, and switch on to decimal number display.
(pdf 17/4 cont)
The Unicode Standard provides an explicit
way to encode a half-letter form. To do
this, a ZERO WIDTH JOINER is inserted after
the virama:
U+09A4 BENGALI LETTER TA - (decimal: ত)
U+09CD BENGALI SIGN VIRAMA (= hasant) - (decimal: ্)
U+200B ZERO WIDTH SPACE - (decimal: ​)
U+09A4 BENGALI LETTER TA - (decimal: ত)UKT 140613: Using decimal numbers instead of hexadecimal numbers used by Unicode, you can still encode:
ত + ্ + ​ + ত --> ত্ত
Remember to remove the + signs in the code view.
This sequence is always displayed as a
khanda «ta» glyph followed by a full «ta»
glyph. Even if the consonant “ta” is not
present, the sequence U+09A4, U+09CD,
U+200D is displayed as a khanda «ta» glyph
.
The Unicode Standard provides an explicit way to show the virama glyph. To do this, a ZERO WIDTH NON-JOINER is inserted after the virama: (p232 pdf 17/4 end) (pdf 17/4 begin
U+09A4 BENGALI LETTER TA
U+09CD BENGALI SIGN VIRAMA (= hasant)
U+200C ZERO WIDTH NON-JOINER
U+09A4 BENGALI LETTER TA
This sequence is always displayed as a
full «ta» glyph combined with a virama
glyph and followed by a full «ta» glyph
.
A summary image of various sequences is shown in Figure 9-10.
(pdf 17/4 cont)
Ya-phalaa (pronounced jo-phola in Bengali)
is a presentation form of U+09AF য
BENGALI LETTER YA . [UKT ¶]
UKT 140613: The equivalent of Ban-Ben য «ya» in Skt-Dev is य «ya», and in Bur-Myan
{ya.}. Its IPA transcription is /j/ and is classified as a palatal approximant. However, I have shifted it to the velar position to make way for
{Ña.} which is a basic akshara in Bur-Myan and is classified a palatal. Skt-Dev does not have
{Ña.} as a basic akshara, and treats it as a horizontal conjunct of two
{ña.}. However, a study of killed
{ý} & killed
{Ñ} in Bur-Myan shows that it is a true basic akshara (even though it is present only in Bur-Myan among the BEPS languages.
(pdf 17/4 cont)
Represented by the sequence <U+09CD
् BENGALI VIRAMA + U+09AF য
BENGALI YA >, ya-phalaa has a special form
.
[UKT ¶]
UKT 140613: I am wondering if
is killed
{ý}. However, the sequence is not the same.
When combined with U+09BE ा BENGALI VOWEL SIGN AA , it is used for transcribing [the English short a ] [æ] as in the “a” in the English word <bat>. [UKT ¶]
UKT note: Because, I cannot show Ban-Ben ā (long vowel), I have shown Skt-Dev equivalent for (U+09BE)
Ya-phalaa can be applied to initial vowels as well:
0985 + 09CD + 09AF + 09BE
অ + ্ + য + া --> অ্যা (a-hasant ya-aa)
098F + 09CD + 09AF + 09BE
এ + ্ + য + া --> এ্যা (e-hasant ya-aa)
If a candrabindu or other combining mark needs to be added in the sequence, it comes at the end of the sequence. For example:
UKT 140613: I tried to write the above on my own. However, the chandra bindu moves forward:অ + ্ + য + া + ঁ --> অ্যাঁ
Further examples:
(pdf 17/4 cont)
Like other Asokan Brahmic
scripts in the Unicode Standard, Bengali
uses the virama to form conjunct characters.
For example,
U+0995 ক BENGALI LETTER KA +
U+09CD ্ BENGALI VIRAMA +
U+09B7 ষ BENGALI LETTER SSA
--> the conjunctKSSA pronounced "khya" in Assamese
UKT 140613: Assam, lying on the hills
between India and Myanmarpré shares
many cultural and linguistic roots with
Myanmarpré. Of course the pronunciation
"khya" is not surprising. We
find the same conjunct in
Skt-Dev, क + ् + ष
--> क्ष
«kṣa». This is what I have been
calling "Pseudo-Kha" because
many words spelled in Pal-Myan with
{hka.} are spelled with
क्ष «kṣa»
in Skt-Dev. I cannot say that for Ban-Ben -
unless I know the language.
For general principles regarding the rendering of the Bengali script, see the rules for rendering in Section 9.1, Devanagari.
Danda
{poad hprût} and double danda
{poad-ma.} marks as well as some other
unified punctuation used with Bengali
are found in the Devanagari block; see
Section 9.1, Devanagari.
End of TIL file