-
Recent Posts
Recent Comments
Archives
- January 2016
- November 2014
- February 2014
- December 2013
- September 2013
- July 2013
- May 2013
- March 2013
- February 2013
- January 2013
- May 2012
- April 2012
- February 2012
- January 2012
- November 2011
- October 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
Categories
Meta
Category Archives: th
Syllabification in Thai
We conclude that in order to work out whether a string of characters in Thai text represents an open or closed syllable, you have to know up front what the text actually says. This makes trying to determine how Thai … Continue reading
Posted in th
Leave a comment
I didn’t build a finite-state transducer in the end
I just used regexes and a big switch… case… statement. Sorry. Maybe next time.
Posted in th
Leave a comment
No sir, I can’t abugida
The peculiar and rebarbative romanization Google Translate uses for Thai is ISO 11940, which is 86 Swiss Francs to you at the time of writing. I’m trying to work out something between that and the rather lossy RTGS, but my … Continue reading
Posted in th
Leave a comment
I keep thinking U+0E00 onwards is the private use area
I’ve complained previously about Google’s romanization of Thai, which is hopeless and looks like this: M?w n??ng xy?? bn m? th?hi w and the RTGS transcription doesn’t mark tone. I can’t really experiment much more with someone else’s machine translation … Continue reading
Posted in th
Leave a comment
Combining characters
Lots of Indic scripts are abugidas, which mean that a “consonant” on its own means consonant + schwa (roughly), and otherwise you have consonants and vowels as usual. What comes as a surprise to lots of people used to the … Continue reading
Posted in th
Leave a comment
Ergativity
REVISION: If languages are ergative, like Basque and others I can’t remember right now, then they mark the subjects of transitive verbs, but not intransitive verbs. So “John opened the door and ran away” would require two “Johns”, one in … Continue reading
Posted in th
Leave a comment
Thai, which turns out to have an abugida like Hindi or Tamil, marks tone on the initial consonant sometimes and also on the vowel; it seems that Google have rolled their own transliteration based very closely on the script rather than using one of the lossy pre-canned ones
Ah.
Posted in th
Leave a comment
Your own rules are made to be broken
I may cheat and find a less rebarbative, browser-friendlier romanization of Thai than the one I’m getting from Google. It would help to be able to type rather than cut and paste.
Posted in th
Leave a comment
Classifiers
C?h?n s???x s?xng m?? C?h?n s???x m?? s??m t?w C?h?n s???x s??? m?? C?h?n s???x m?? s?ib Japanese, Korean and Chinese tend not to have separate plural forms for nouns, but use classifiers instead, like “rasher” in “five rashers of … Continue reading
Posted in th
Leave a comment
Every cat chases some dog.
There are, as well you know, two readings for that sentence. What does it look like in Thai? According to Google, like this: ??????????? chases ??? Maybe this gives us a clue about the training corpus. “Every cat loves some … Continue reading
Posted in th
Leave a comment