thespot4sap.com independent sap information
 

get SAP Access - pay monthly

SAP Tutorials    Online SAP Training    SAP CBT's    Forums    SAP Articles    SAP Jobs    Resumes
  SAP Access    SAP Blogs    SAP Books     Links     Vendor Directory     Submit Content    Search
Previous posts in SAPscript
Page 4916 of 5524

Traditional versus modern sorts

Blogger : MSDN Blogs
All posts : All posts by MSDN Blogs
Category : SAPscript
Blogged date : 2006 Mar 12

Although the model for collation would be simpler if it never changed, the fact is that changes do happen, so it is important to capture that change.

I won't talk about the Spanish case today, though that one is interesting for other reasons -- stay tuned for a future blog post. :-)

But there are several other interesting ones to ponder....

Georgian is a good example -- there are four letters that do not appear in modern use but can appear in older documents. So the 'modern' sort puts these four characters at the end of the alphabet rather than interspersing them in the traditional order.

Those four characters are:

U+10f1   ?   GEORGIAN LETTER HE

U+10f2   ?   GEORGIAN LETTER HIE

U+10f3   ?   GEORGIAN LETTER WE

U+10f4   ?   GEORGIAN LETTER HAR

These are of course the modern Mkhedruli Georgian characters; in theory you would also want to handle the Khutsuri and Nushkuri in a similar way (all three scripts discussed here).

Although in practice a modern sort's handling of archaric characters inside of script subranges used only in archaic contexts is, to say the least, questionable. In my opinion, at least. :-)

These two sorts are supported by Windows -- 0x0437 for the traditional sort and 0x10437 for the modern one.

Now if you look at Korean Jamo you have a dfferent situation.

The original ordering was first described by Choi Sejin (Wikipedia link) in the year 1527; it goes something like this:

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Now this ordering was created before several other innovations such as the "double consonants" were added to the language, which creates a real question for where the new Jamo should be added to 'alphabetical order.'

Is it most faithful to Choi Sejin's classical ordering (which is highly respected) to put the new Jamo at the end, or to intersperse them in the appropriate places next to the Jamo that they relate to?

An interesting question, one with many linguistic, philosophical, and historical issues tied up with it.

As an aside, one could perhaps argue that the whole LVT -- leading/vowel/trailing -- mechanism used in discussions about Jamo/Hangul collation is an artifact of implementations -- and that the reason that ? (U+1100) and ? (U+11a8) look the same is that they are the same -- note that Choi Sejin's order did not include two separate letters here to handle whether a consonant was leading or trailing?

Ok, back to that other interesting question. :-)

In South Korea, the decision was made to do the interspersing, an argument which one could argue has a more linguistic basis (on the other hand one could make the same argument for the phonemic decision in Lithuanian!). In North Korea, on the other hand, the decision was made to put most of the new Jamo at the end.

Which of course means that this not only involves linguistic, philosophical, and historical issues, but add to that political issues, as well....

Now since the 11,172 modern Hangul Syllables are actually built from these Jamo, this "small" question would have a marked impact on the sorting of Hangul. Not being a native speaker/writer/reader of Korean I cannot say for sure, but I do wonder how easy it is to work with one order if one learned with the other....

In Windows, only the option that intersperses the Jamo is supported. At the present time there are also too many political issues tied up in the question to allow any other option to be chosen.

Though I admit to that curiousity about how recognizable the other ordering would be in practice to a child, or to an adult, in South Korea. Would it be as jarring as the Lithuanian collation ("Y sorts just after I" rather than after X) would be to a native English speaker, only more so since it affects such a greater number of characters?

 

This post brought to you by "?" (U+11a8, a.k.a. HANGUL JONGSEONG KIYEOK)
(as distinguisghed from "?" U+1100 a.k.a. HANGUL CHOSEONG KIYEOK, of course!)


Read comments or post a reply to : Traditional versus modern sorts
Page 4916 of 5524

Newest posts
New Page 1

 

 

About Us   Contact Us   Privacy   Disclaimer   Feedback   Email Discussion   Newsletter  

Copyright © - Independent SAP Information
Learn XML, Guesthouses and B&B's