Combining characters

Lots of Indic scripts are abugidas, which mean that a “consonant” on its own means consonant + schwa (roughly), and otherwise you have consonants and vowels as usual.

What comes as a surprise to lots of people used to the Roman alphabet is that some vowels, like i in Hindi or Sanskrit, are conventionally written before their consonant. See example here. This convention isn’t ill-founded. Say “horse”. Out loud.

Now say “horse”, but stop before you get to the “h”. Now say “cat” and stop before you get to the “c”. Now say “kidney” and stop before you get to the “k”. Your mouth is forming the shape of the vowel even before the consonant has come out. I’ve seen spectrograms that bear this out.

But Thai? There are vowels that go before their consonants (where in the Roman alphabet they would come after), but while the Unicode spec for Devanagari says that they come after their consonants in strings, which makes rendering Devanagari on the screen or on the page more challenging than you might expect, the Unicode spec for Thai has the string order (the logical order) and the rendering order the same.

This is something to bear in mind when writing a romanization algorithm.

