HEDDLU POLICE

The singular is HEDDWAS POLICEMAN, of course. Etymologically it’s hedd + llu, the first component meaning peace, as in A oes heddwch? “Is there peace?”, the a being another example of a sentence-initial interrogative particle, and the second component looking as if it’s cognate with Gaelic luchd, German Leute and Russian ljudi. But how to find out?

Posted in gloyn glo gloen | 1 Comment

Syllabification in Thai

We conclude that in order to work out whether a string of characters in Thai text represents an open or closed syllable, you have to know up front what the text actually says. This makes trying to determine how Thai works purely from machine translation rather too difficult to be fun.

Posted in th | Leave a comment

I didn’t build a finite-state transducer in the end

I just used regexes and a big switch… case… statement. Sorry. Maybe next time.

Posted in th | Leave a comment

No sir, I can’t abugida

The peculiar and rebarbative romanization Google Translate uses for Thai is ISO 11940, which is 86 Swiss Francs to you at the time of writing.

I’m trying to work out something between that and the rather lossy RTGS, but my first attempt looks like this:

m_2aa (c_uee1qs^iT_y_aas^aas^tr_ Equus caballus h^r_ueeq Equus ferus caballus) peaen_s^ats^l_eii2y_g_l_uukd2s^y_n_m_sue1g_m_iiK_s^aam_h^l_aakh^l_aay_T_aag_s^aay_P_an_T_uT_ii1T^uukm_n_us^y_n_amm_aal_eii2y_g_l_aeaT^uukc_ai2n_aikijkr_r_m_kaar_dein_T_aag_K^n_s^1g_ kaar_T_h^aar_ kiil_aa s^an_T_n_aakaar_l_aeaqaajjac_ai2peaen_qaah^aar_K^qg_m_n_us^y_n_aibaag_s^aT_n_T_r_r_m_m_aan_aan_n_abP_an_piil_ae2s^ pajjuban_bT_baaT_K^qg_m_2aaT^uukT_aen_T_ii1d2s^y_y_aan_P_aah^n_abaebh^aim_1jn_ bT_baaT_l_dl_g_paih^el_ueeqP_eiiy_g_T_aag_kiil_aal_aeas^an_T_n_aakaar_doy_s^1s^n_h^aiy_1 ta2g_tae1n_aiqdiitjn_T^ueg_pajjuban_r_eaajah^eaen_m_2aapeaen_s^ay_l_aks^n_T_ii1K_s^bK_uu1kabK_aas^bqy_

It’s not meant to look like Klingon. The idea is that high tone consonants are marked with ^, low tone consonants with _, and mid-tone consonants with nothing at all. Tone marks ek, tho, tri and chattawa are marked 1, 2, 3 and 4, as their names suggest. Further postprocessing should combine ^, _ and a digit into some sort of sensible tone-marking. Maybe IPA.

With post-processing in mind, I wrote aspirated consonants as capitals, ng as g, [tɕ] as j and [tɕh] as c. Maybe that’s a bit mad.

The big problem, though, is determining syllable boundaries. There are both open (consonant–vowel) and closed (consonant–vowel–consonant) syllables in Thai, yet the script, being an abugida, implies a vowel after consonants that don’t already have a vowel attached. This may involve “cheating” and looking up someone else’s work…

Posted in th | Leave a comment

This is the letter I didn’t get around to sending to the Guardian Review section before Tuesday

See here (Giles Fraser spoils perfectly reasonable book review with weird rightwing digression at the beginning in which he misunderstands his native language and tries to make a point about relativism).

When his daughter says “It’s, like, raining” she is very strongly committed to it actually being raining. Nobody says “It’s, like, raining” when they’re making smalltalk about the weather. People do, however, say, “It’s, like, raining” when they’ve been asked to go outside into the rain. “like” here acts as a bit of Gricean sugaring to show (maxim of relevance) that your statement about the weather is in response to having been asked to wash the car or take out the rubbish or play nicely in the garden, and isn’t an attempt to change the subject, possibly to smalltalk about the weather.

Posted in maunderings | Leave a comment

WICKSTEED

KETTERING

 

WICKSTEED

KETTERING

 

WICKSTEED

KETTERING

 

WICKSTEED

KETTERING

 

WICKSTEED

KETTERING

 

WICKSTEED

KETTERING

 

WICKSTEED

KETTERING

 

WICKSTEED

KETTERING

 

WICKSTEED

KETTERING

 

WICKSTEED

KETTERING

Posted on by colin | Leave a comment

Crunchy onions

I refactored a risotto. Reasoning that I didn’t have much time, and that the two rate-determining steps are the softening of the onions and the uptake of the stock by the rice, I set those processes running in parallel and merged the two at the end, then adding the vegetables and cheese.

Result? Crunchy onions.

Next week: What MapReduce can do for cakes.

Posted in two-pot synthesis | Leave a comment

I keep thinking U+0E00 onwards is the private use area

I’ve complained previously about Google’s romanization of Thai, which is hopeless and looks like this:

Mæw nạ̀ng xyū̀ bn mæ thṭhi w

and the RTGS transcription doesn’t mark tone. I can’t really experiment much more with someone else’s machine translation before I sort this out, so here’s an interim result from my zu Hause gebastelt romanizer:

CCCCC CCvC*CCCVCCVCVCVCVC
vC*VC*VCVCCCCC*CVCCVC CCCCCCCCVCCVvCCCCVCVCVC (3 CVCVCC C.C. 2426-13 CVCVCVCC C.C. 2463) vC*CCCVCVCvCCCCCC*CV* 40 vCCCVCVCCCvC*CCCVCVCCCCvCC*VvC*VCCV*CVC vCVCCC*CV* 4 vCCCvC*CCCVCCVCVCCVCCCVCCCCVCVCVCVC CCVCVCCVCCCC vCVCCCvC*CCCvC*CCCVCCVCVCVCVC vCCCVCVCCCvC*CCCVCCCVCvCC*VvC*VCCV*CVC

As you can see, it distinguishes consonants, vowels that go after the consonant, vowels that go before the consonant but you would transliterate after the consonant, and single-byte UTF-8 characters and renders things that appear after U+0E44 as asterisks. Now that I can reduce everything to integers, which in PHP is needlessly fiddly,  I can build a finite-state transducer.

This sounds much more cat’s-whiskers-and-practical-electronics-for-the-technical-man than it really is.

Posted in th | Leave a comment

蛸壺や / はかなき夢を / 夏の月

Octopuses, being able to see the future, are keenly aware of the transience of things.

I think natsu no tsuki on the last line is a reference to a football.

Posted in CJK | Leave a comment

From the engineers who brought you the cogs on the two pound coin

I see the visual identity for the 2011 census is an origami bus with octagonal wheels.

Posted in maunderings | Leave a comment