What is cp51932.enc

Codepage 51932 is a superset of euc-jp. The encoding is called as a euc-jp on Windows. BUT it is not same as a encoding called "euc-jp" on unix. 51932 has a little more code sets of Kanji(Chinese character) and special symbols than euc-jp on unix. The 51932 is designed to interconversion between cp51932 and cp932 by Microsoft. InternetExplorer, Firefox, Hidemaru(famous japanese text editor) are using the 51932 called it as a "euc-jp". Tcl's euc-jp has basic code set only. So we should use 51932 to convert web page written by euc-jp. The next figure is a code set of 51932.

Codesetcp51932Unicode
1st byte2nd byte
JIS X 0201 Latin0x00 - 0x7F---U+0000 - U+007F
JIS X 0201 Katakana0x8E0xA1 - 0xDFU+FF61 - U+FF9F
JIS X 0208:19970xA1 - 0xA80xA1 - 0xFEconvert by a table
0xB0 - 0xF4
NEC special characters0xAD
NEC selection of IBM extensions0xF9 - 0xFC

Tcl's euc-jp encoding doen't have NEC special characters and NEC selection of IBM extensions. Next tables shows these all characters.


NEC special characters

0123456789ABCDEF
ADA0
ADB0 
ADC0
ADD0        
ADE0
ADF0   


NEC selection of IBM extensions

0123456789ABCDEF
F9A0
F9B0俿
F9C0
F9D0
F9E0
F9F0


0123456789ABCDEF
FAA0
FAB0
FAC0
FAD0氿
FAE0溿
FAF0 


0123456789ABCDEF
FBA0
FBB0
FBC0
FBD0
FBE0
FBF0譿


0123456789ABCDEF
FCA0
FCB0
FCC0
FCD0
FCE0 
FCF0  


I used mlang.dll to mapping code sets (by using mlenc). And I found a difference of mapping between 51932 and euc-jp. Unicode(Tcl internal) to 51932/euc-jp Mappings doesn't have problem. But this 51932 or euc-jp to unicode mapping may make a bit trouble. You should remember this difference if you will use 51932 instead of euc-jp.

51932euc-jp
0xA1C1U+FF5EU+301C
0xA1C2U+2225U+2016
0xA1DDU+FF0DU+2212
0xA1F1U+FFE0U+00A2
0xA1F2U+FFE1U+00A3
0xA2CCU+FFE2U+00AC



By the way...
I tried to make eucJP-ms encoding file too. The encoding is also called as eucJP-open. And It's aim is same as the cp51932. (Interconversion between euc-jp and cp932). It is also superset of euc-jp. It was designed by TOG/JVC CDE/Motif Technical WG. The encoding was supported by MySQL for several years. The encoding is used by MySQL, PostgreSQL and some unix applications.
If Tcl had this encoding, it is more convenient. But I couldn't make the encoding file. Because the encoding conversion system of Tcl doesn't support 3byte input.

Download

http://reddog.s35.xrea.com/software/cp51932.enc.zip

Comment



CategoryTclTk CategoryEnglish


|New|Edit|Freeze|Diff|History|Attach|Copy|Rename|
Last-modified: 2009-06-22 (Mon) 21:01:56
HTML convert time: 0.049 sec.